Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Publikationer (10 of 10) Visa alla publikationer
Karlsson, M. & Hössjer, O. (2023). Classification Under Partial Reject Options. Journal of Classification, Article ID s00357-023-09455-x.
Öppna denna publikation i ny flik eller fönster >>Classification Under Partial Reject Options
2023 (Engelska)Ingår i: Journal of Classification, ISSN 0176-4268, E-ISSN 1432-1343, artikel-id s00357-023-09455-xArtikel i tidskrift (Refereegranskat) Epub ahead of print
Abstract [en]

In many applications there is ambiguity about which (if any) of a finite number N of hypotheses that best fits an observation. It is of interest then to possibly output awhole set of categories, that is, a scenario where the size of the classified set of categories ranges from 0 to N. Empty sets correspond to an outlier, sets of size 1 represent a firm decision that singles out one hypothesis, sets of size N correspond to a rejection to classify, whereas sets of sizes 2,..., N - 1 represent a partial rejection to classify, where some hypotheses are excluded from further analysis. In this paper, we review and unify several proposed methods of Bayesian set-valued classification, where the objective is to find the optimal Bayesian classifier that maximizes the expected reward. We study a large class of reward functions with rewards for sets that include the true category, whereas additive or multiplicative penalties are incurred for sets depending on their size. For models with one homogeneous block of hypotheses, we provide general expressions for the accompanying Bayesian classifier, several of which extend previous results in the literature. Then, we derive novel results for the more general setting when hypotheses are partitioned into blocks, where ambiguity within and between blocks are of different severity. We also discuss how well-known methods of classification, such as conformal prediction, indifference zones, and hierarchical classification, fit into our framework. Finally, set-valued classification is illustrated using an ornithological data set, with taxa partitioned into blocks and parameters estimated using MCMC. The associated reward function's tuning parameters are chosen through cross-validation.

Nyckelord
Blockwise cross-validation, Bayesian classification, Conformal prediction, Classes of hypotheses, Indifference zones, Markov Chain Monte Carlo, Reward functions with set-valued inputs, Set-valued classifiers
Nationell ämneskategori
Matematik Psykologi
Identifikatorer
urn:nbn:se:su:diva-225421 (URN)10.1007/s00357-023-09455-x (DOI)001113203500001 ()
Tillgänglig från: 2024-01-16 Skapad: 2024-01-16 Senast uppdaterad: 2024-01-16
Karlsson, M. & Hössjer, O. (2023). Identification of taxon through classification with partial reject options. The Journal of the Royal Statistical Society, Series C: Applied Statistics, 72(4), 937-975
Öppna denna publikation i ny flik eller fönster >>Identification of taxon through classification with partial reject options
2023 (Engelska)Ingår i: The Journal of the Royal Statistical Society, Series C: Applied Statistics, ISSN 0035-9254, E-ISSN 1467-9876, Vol. 72, nr 4, s. 937-975Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Identification of taxa can significantly be assisted by statistical classification based on trait measurements either individually or by phylogenetic (clustering) methods. In this article, we present a general Bayesian approach for classifying species individually based on measurements of a mixture of continuous and ordinal traits, and any type of covariates. The trait vector is derived from a latent variable with a multivariate Gaussian distribution. Decision rules based on supervised learning are presented that estimate model parameters through blocked Gibbs sampling. These decision regions allow for uncertainty (partial rejection), so that not necessarily one specific category (taxon) is output when new subjects are classified, but rather a set of categories including the most probable taxa. This type of discriminant analysis employs reward functions with a set-valued input argument, so that an optimal Bayes classifier can be defined. We also present a way of safeguarding against outlying new observations, using an analogue of a p-value within our Bayesian setting. We refer to our Bayesian set-valued classifier as the Karlsson–Hössjer method, and it is illustrated on an original ornithological data set of birds. We also incorporate model selection through cross-validation, exemplified on another original data set of birds. 

Nyckelord
Bayesian classification, classification with covariates, partial observations, set-valued classifiers, species identification, statistical ornithology
Nationell ämneskategori
Sannolikhetsteori och statistik
Forskningsämne
matematisk statistik
Identifikatorer
urn:nbn:se:su:diva-203752 (URN)10.1093/jrsssc/qlad036 (DOI)001019502200001 ()
Tillgänglig från: 2022-04-21 Skapad: 2022-04-21 Senast uppdaterad: 2023-12-19Bibliografiskt granskad
Hössjer, O. & Karlsson, M. (2023). On the use of L-functionals in regression models. Open Mathematics, 21(1), Article ID 20220597.
Öppna denna publikation i ny flik eller fönster >>On the use of L-functionals in regression models
2023 (Engelska)Ingår i: Open Mathematics, ISSN 2391-5455, Vol. 21, nr 1, artikel-id 20220597Artikel, forskningsöversikt (Refereegranskat) Published
Abstract [en]

In this article, we survey and unify a large class or L -functionals of the conditional distribution of the response variable in regression models. This includes robust measures of location, scale, skewness, and heavytailedness of the response, conditionally on covariates. We generalize the concepts of L -moments (G. Sillito, Derivation of approximants to the inverse distribution function of a continuous univariate population from the order statistics of a sample, Biometrika 56 (1969), no. 3, 641–650.), L -skewness, and L -kurtosis (J. R. M. Hosking, L-moments: analysis and estimation of distributions using linear combinations or order statistics, J. R. Stat. Soc. Ser. B Stat. Methodol. 52 (1990), no. 1, 105–124.) and introduce order numbers for a large class of L -functionals through orthogonal series expansions of quantile functions. In particular, we motivate why location, scale, skewness, and heavytailedness have order numbers 1, 2, (3,2), and (4,2), respectively, and describe how a family of L -functionals, with different order numbers, is constructed from Legendre, Hermite, Laguerre, or other types of polynomials. Our framework is applied to models where the relationship between quantiles of the response and the covariates follows a transformed linear model, with a link function that determines the appropriate class of L -functionals. In this setting, the distribution of the response is treated parametrically or nonparametrically, and the response variable is either censored/truncated or not. We also provide a framework for asymptotic theory of estimates of L -functionals and illustrate our approach by analyzing the arrival time distribution of migrating birds. In this context, a novel version of the coefficient of determination is introduced, which makes use of the abovementioned orthogonal series expansion.

Nyckelord
bird phenology, coefficient of determination, L-functionals, L-statistics, order numbers, orthogonal series expansion, quantile function, quantile regression
Nationell ämneskategori
Sannolikhetsteori och statistik
Identifikatorer
urn:nbn:se:su:diva-203755 (URN)10.1515/math-2022-0597 (DOI)001053084400001 ()2-s2.0-85170428452 (Scopus ID)
Tillgänglig från: 2022-04-21 Skapad: 2022-04-21 Senast uppdaterad: 2023-09-21Bibliografiskt granskad
Karlsson, M. (2022). Statistical Methods for Taxon Classification and Bird Migration Phenology. (Doctoral dissertation). Stockholm: Department of Mathematics, Stockholm University
Öppna denna publikation i ny flik eller fönster >>Statistical Methods for Taxon Classification and Bird Migration Phenology
2022 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The connection between ecology and statistics is deep. Methodological advancement in statistics open up new possibilities to understand the distribution of life on earth, and research questions in ecology cause new statistical methods to be developed. The four papers of this thesis examplify this exchange in providing a statistical approach to taxon classification, and developing novel measures of distributional properties driven by the application area of phenology.

Paper I contains a comprehensive Bayesian approach to phenotypical taxon classification with covariates. We formulate a multivariate regression model for a collection of phenotypical traits, which are assumed to be partial observations of latent variables with a Gaussian distribution. Through blocked Gibbs sampling we estimate the parameters of these distributions for a real data set, and derive decision regions of new observations in terms of set-valued classifiers, called Karlsson-Hössjer (K-H) classifiers, analogous to partial reject options. We introduce model selection through cross-validation and compare the K-H classifier’s performance with other existing methods on real data.

Paper II introduces a general Bayesian framework for K-H classification. This is achieved by using a reward function with a set-valued argument, and in this context we derive the optimal Bayes classifier, for a homogeneous block of hypotheses as well as for scenarios where the hypotheses are divided into blocks, and where misclassification or ambiguity within blocks is less or more serious than between. These reward functions include tuning parameters which we choose using cross-validation, and we apply the method to a real data set with block structure.

In Paper III a large class of L-functionals is studied for the response variable in regression models. These L-functionals are given order numbers through an orthogonal series expansion of the quantile function of the response variable. We apply the framework to quantile regression models with and without transformations of the outcome variable, and present a unified asymptotic theory for estimates of L-functionals. The derived estimators are applied to a quantile regression model for phenological analysis, and in this context a novel version of the coefficient of determination is introduced.

In Paper IV two statistical approaches for phenological analysis are compared, for singular as well as for multiple species models. For singular species, we show that the estimates from linear models fitted to empirical quantiles of the response distribution give less detailed results on the effects of covariates compared to non-parametric quantile regression. For multiple species models, we highlight an identifiability issue in quantile regression with random effects, and deduce similarity of performance of a mixed effects linear model for empirical quantiles and a quantile regression model with species as one of the covariates.

Ort, förlag, år, upplaga, sidor
Stockholm: Department of Mathematics, Stockholm University, 2022. s. 39
Nyckelord
Classification, quantile regression, phenology, statistical ornithology, L-functionals, set-valued classification, species identification, statistical ecology, multispecies modelling
Nationell ämneskategori
Sannolikhetsteori och statistik Ekologi
Forskningsämne
matematisk statistik
Identifikatorer
urn:nbn:se:su:diva-204128 (URN)978-91-7911-892-1 (ISBN)978-91-7911-893-8 (ISBN)
Disputation
2022-06-07, sal 15, hus 5, Kräftriket, Roslagsvägen 101, online via Zoom, public link is available at the department website, Stockholm, 09:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2022-05-13 Skapad: 2022-04-21 Senast uppdaterad: 2022-05-06Bibliografiskt granskad
Persson, S., Alm, E., Karlsson, M., Enkirch, T., Norder, H., Eriksson, R., . . . Ellström, P. (2021). A new assay for quantitative detection of hepatitis A virus. Journal of Virological Methods, 288, Article ID 114010.
Öppna denna publikation i ny flik eller fönster >>A new assay for quantitative detection of hepatitis A virus
Visa övriga...
2021 (Engelska)Ingår i: Journal of Virological Methods, ISSN 0166-0934, E-ISSN 1879-0984, Vol. 288, artikel-id 114010Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Hepatitis A virus (HAV) is mainly transmitted via contaminated food or water or through person-to-person contact. Here, we describe development and evaluation of a reverse transcription droplet digital PCR (RTddPCR) and reverse transcription real-time PCR (RT-qPCR) assay for detection of HAV in food and clinical specimens. The assay was evaluated by assessing limit of detection, precision, matrix effects, sensitivity and quantitative agreement. The 95 % limit of detection (LOD95 %) was 10 % higher for RT-ddPCR than for RTqPCR. A Bayesian model was used to estimate precision on different target concentrations. From this, we found that RT-ddPCR had somewhat greater precision than RT-qPCR within runs and markedly greater precision between runs. By analysing serum from naturally infected persons and a naturally contaminated food sample, we found that the two methods agreed well in quantification and had comparable sensitivities. Tests with artificially contaminated food samples revealed that neither RT-ddPCR nor RT-qPCR was severely inhibited by presence of oysters, raspberries, blueberries or leafy-green vegetables. For this assay, we conclude that RT-qPCR should be considered if rapid, qualitative detection is the main interest and that RT-ddPCR should be considered if precise quantification is the main interest. The high precision of RT-ddPCR allows for detection of small changes in viral concentration over time, which has direct implications for both food control and clinical studies.

Nyckelord
Hepatitis A virus, Digital PCR, Real-time PCR, Reverse transcription, Validation, Food-borne virus
Nationell ämneskategori
Mikrobiologi inom det medicinska området Biologiska vetenskaper
Identifikatorer
urn:nbn:se:su:diva-190034 (URN)10.1016/j.jviromet.2020.114010 (DOI)000604174900004 ()33152410 (PubMedID)
Tillgänglig från: 2021-02-24 Skapad: 2021-02-24 Senast uppdaterad: 2022-02-25Bibliografiskt granskad
Persson, S., Karlsson, M., Borsch-Reniers, H., Ellström, P., Eriksson, R. & Simonsson, M. (2019). Missing the Match Might Not Cost You the Game: Primer-Template Mismatches Studied in Different Hepatitis A Virus Variants. Food and Environmnetal Virology, 11(3), 297-308
Öppna denna publikation i ny flik eller fönster >>Missing the Match Might Not Cost You the Game: Primer-Template Mismatches Studied in Different Hepatitis A Virus Variants
Visa övriga...
2019 (Engelska)Ingår i: Food and Environmnetal Virology, ISSN 1867-0334, E-ISSN 1867-0342, Vol. 11, nr 3, s. 297-308Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Mismatches between template sequences and reverse transcription (RT) or polymerase chain reaction (PCR) primers can lead to underestimation or false negative results during detection and quantification of sequence-diverse viruses. We performed an in silico inclusivity analysis of a widely used RT-PCR assay for detection of hepatitis A virus (HAV) in food, described in ISO 15216-1. One of the most common mismatches found was a single G (primer) to U (template) mismatch located at the terminal 3 '-end of the reverse primer region. This mismatch was present in all genotype III sequences available in GenBank. Partial HAV genomes with common or potentially severe mismatches were produced by in vitro transcription and analysed using RT-ddPCR and RT-qPCR. When using standard conditions for RT-qPCR, the mismatch identified resulted in underestimation of the template concentration by a factor of 1.7-1.8 and an increase in 95% limit of detection from 8.6 to 19 copies/reaction. The effect of this mismatch was verified using full-length viral genomes. Here, the same mismatch resulted in underestimation of the template concentration by a factor of 2.8. For the partial genomes, the presence of additional mismatches resulted in underestimation of the template concentration by up to a factor of 232. Quantification by RT-ddPCR and RT-qPCR was equally affected during analysis of RNA templates with mismatches within the reverse primer region. However, on analysing DNA templates with the same mismatches, we found that ddPCR quantification was less affected by mismatches than qPCR due to the end-point detection technique.

Nyckelord
Digital PCR, Real-time PCR, Reverse transcription, Primer, Mismatch, Hepatitis A virus
Nationell ämneskategori
Biologiska vetenskaper
Identifikatorer
urn:nbn:se:su:diva-173094 (URN)10.1007/s12560-019-09387-z (DOI)000480501400011 ()31004336 (PubMedID)
Tillgänglig från: 2019-10-07 Skapad: 2019-10-07 Senast uppdaterad: 2022-03-23Bibliografiskt granskad
Lehikoinen, A., Lindén, A., Karlsson, M., Andersson, A., Crewe, T. L., Dunn, E. H., . . . Skjold Tjørnløv, R. (2019). Phenology of the avian spring migratory passage in Europe and North America: Asymmetric advancement in time and increase in duration. Ecological Indicators, 101, 985-991
Öppna denna publikation i ny flik eller fönster >>Phenology of the avian spring migratory passage in Europe and North America: Asymmetric advancement in time and increase in duration
Visa övriga...
2019 (Engelska)Ingår i: Ecological Indicators, ISSN 1470-160X, E-ISSN 1872-7034, Vol. 101, s. 985-991Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Climate change has been shown to shift the seasonal timing (i.e. phenology) and distribution of species. The phenological effects of climate change on living organisms have often been tested using first occurrence dates, which may be uninformative and biased. More rarely investigated is how different phases of a phenological sequence (e.g. beginning, central tendency and end) or its duration have changed over time. This type of analysis requires continuous observation throughout the phenological event over multiple years, and such data sets are rare. In this study we examined the impact of temperature on long-term change of passage timing and duration of the spring migration period in birds, and which species' traits explain species-specific variation. Data used covered 195 species from 21 European and Canadian bird observatories from which systematic daily sampling protocols were available. Migration dates were negatively associated with early spring temperature and timings had in general advanced in 57 years. Short-distance migrants advanced the beginning of their migration more than long-distance migrants when corrected for phylogenic relatedness, but such a difference was not found in other phases of migration. The advancement of migration has generally been greater for the beginning and median phases of migration relative to the end, leading to extended spring migration seasons. Duration of the migration season increased with increasing temperature. Phenological changes have also been less noticeable in Canada even when corrected for rate of change in temperature. To visualize long-term changes in phenology, we constructed the first multi-species spring migration phenology indicator to describe general changes in median migration dates in the northern hemisphere. The indicator showed an average advancement of one week during five decades across the continents (period 1959-2015). The indicator is easy to update with new data and we therefore encourage future research to investigate whether the trend towards longer periods of occurrence or emergence in spring is also evident in other migratory populations. Such phenological changes may influence detectability in monitoring schemes, and may have broader implications on population and community dynamics.

Nyckelord
Avian movement, Environmental change, Global warming, Long-term monitoring
Nationell ämneskategori
Biologiska vetenskaper Geovetenskap och miljövetenskap
Identifikatorer
urn:nbn:se:su:diva-170094 (URN)10.1016/j.ecolind.2019.01.083 (DOI)000470963300099 ()
Tillgänglig från: 2019-07-03 Skapad: 2019-07-03 Senast uppdaterad: 2022-02-26Bibliografiskt granskad
Fransson, T., Karlsson, M., Kullberg, C., Stach, R. & Barboutis, C. (2017). Inability to regain normal body mass despite extensive refuelling in great reed warblers following the trans‐Sahara crossing during spring migration. Journal of Avian Biology, 48(1), 58-65
Öppna denna publikation i ny flik eller fönster >>Inability to regain normal body mass despite extensive refuelling in great reed warblers following the trans‐Sahara crossing during spring migration
Visa övriga...
2017 (Engelska)Ingår i: Journal of Avian Biology, ISSN 0908-8857, E-ISSN 1600-048X, Vol. 48, nr 1, s. 58-65Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Migratory birds wintering in Africa face the challenge of passing the Sahara desert with few opportunities to forage. During spring migration birds thus arrive in the Mediterranean area with very low energy reserves after crossing the desert. Since early arrival to the breeding grounds often is of importance to maximize reproductive success, finding stopover sites with good refuelling possibilities after the Saharan passage is of utmost importance. Here we report on extensive fuelling in the great reed warbler Acrocephalus arundinaceus on the south coast of Crete in spring, the first land that they encounter after crossing the Sahara desert and the Mediterranean Sea in this area. Birds were studied at a river mouth and due to an exceptional high recapture rate (45 and 51% in two successive years), we were able to get information about stopover behaviour in 56 individual great reed warblers during two spring seasons. The large proportion of trapped great reed warbler compared to other species and the large number of recaptures suggest that great reed warblers actively choose this area for stopover. They stayed on average 3-4 d, increased on average about 3.5 g in body mass and the average rate of body mass increase was 4.8% of lean body mass d(-1). Wing length affected the rate of increase and indicated that females have a slower increase than males. The results found show that great reed warblers at this site regularly deposit larger fuel loads than needed for one continued flight stage. The low body mass found in great reed warblers (also in birds with high fat scores) is a strong indication that birds staging at Anapodaris still had not been able to rebuild their structural tissue after the strenuous Sahara crossing, suggesting that rebuilding structural tissue may take longer time than previously thought.

Nationell ämneskategori
Biologiska vetenskaper
Identifikatorer
urn:nbn:se:su:diva-140017 (URN)10.1111/jav.01250 (DOI)000395032800006 ()
Tillgänglig från: 2017-02-24 Skapad: 2017-02-24 Senast uppdaterad: 2022-02-28Bibliografiskt granskad
Karlsson, M. & Hössjer, O.A comparison between quantile regression and linear regression on empirical quantiles for phenological analysis in migratory response to climate change.
Öppna denna publikation i ny flik eller fönster >>A comparison between quantile regression and linear regression on empirical quantiles for phenological analysis in migratory response to climate change
(Engelska)Manuskript (preprint) (Övrig (populärvetenskap, debatt, mm))
Abstract [en]

It is well established that migratory birds in general have advanced their arrival times in spring, and in this paper we investigate potential ways of enhancing the level of detail in future phenological analyses. We perform single as well as multiple species analyses, using linear models on empirical quantiles, non-parametric quantile regression and likelihood-based parametric quantile regression with asymmetric Laplace distributed error terms. We conclude that non-parametric quantile regression appears most suited for single as well as multiple species analyses.

Nyckelord
Phenology, quantile regression, mixed effects, arrival times, linear regression, bird observatory
Nationell ämneskategori
Sannolikhetsteori och statistik
Forskningsämne
matematisk statistik
Identifikatorer
urn:nbn:se:su:diva-203757 (URN)arXiv.2202.02206 (DOI)
Tillgänglig från: 2022-04-21 Skapad: 2022-04-21 Senast uppdaterad: 2022-04-21
Karlsson, M. & Hössjer, O.Classification under partial reject options.
Öppna denna publikation i ny flik eller fönster >>Classification under partial reject options
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

We study set-valued classification for a Bayesian model where data originates from one of a finite number N of possible hypotheses. Thus we consider the scenario where the size of the classified set of categories ranges from 0 to N. Empty sets corresponds to an outlier, size 1 represents a firm decision that singles out one hypotheses, size N corresponds to a rejection to classify, whereas sizes 2…,N−1 represent a partial rejection, where some hypotheses are excluded from further analysis. We introduce a general framework of reward functions with a set-valued argument and derive the corresponding optimal Bayes classifiers, for a homogeneous block of hypotheses and for when hypotheses are partitioned into blocks, where ambiguity within and between blocks are of different severity. We illustrate classification using an ornithological dataset, with taxa partitioned into blocks and parameters estimated using MCMC. The associated reward function's tuning parameters are chosen through cross-validation.

Nyckelord
Blockwise cross-validation, Bayesian classification, con- formal prediction, classes of hypotheses, indifference zones, Markov Chain Monte Carlo, reward functions with set-valued inputs, set-val- ued classifiers
Nationell ämneskategori
Sannolikhetsteori och statistik
Forskningsämne
matematisk statistik
Identifikatorer
urn:nbn:se:su:diva-203754 (URN)arXiv.2202.14011 (DOI)
Tillgänglig från: 2022-04-21 Skapad: 2022-04-21 Senast uppdaterad: 2022-04-21
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0001-9662-507x

Sök vidare i DiVA

Visa alla publikationer