Change search
Link to record
Permanent link

Direct link
Boström, Henrik
Publications (10 of 84) Show all publications
Zhao, J., Papapetrou, P., Asker, L. & Boström, H. (2020). Corrigendum to ‘Learning from heterogeneous temporal data in electronic health records’. [J. Biomed. Inform. 65 (2017) 105–119]. Journal of Biomedical Informatics, 101, Article ID 103352.
Open this publication in new window or tab >>Corrigendum to ‘Learning from heterogeneous temporal data in electronic health records’. [J. Biomed. Inform. 65 (2017) 105–119]
2020 (English)In: Journal of Biomedical Informatics, ISSN 1532-0464, E-ISSN 1532-0480, Vol. 101, article id 103352Article in journal (Other academic) Published
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-178462 (URN)10.1016/j.jbi.2019.103352 (DOI)
Note

Refers to:

Jing Zhao, Panagiotis Papapetrou, Lars Asker, Henrik Boström

Learning from heterogeneous temporal data in electronic health records

Journal of Biomedical Informatics, Volume 65, January 2017, Pages 105-119

Available from: 2020-01-29 Created: 2020-01-29 Last updated: 2022-02-26Bibliographically approved
Linusson, H., Johansson, U., Boström, H. & Löfström, T. (2018). Classification With Reject Option Using Conformal Prediction. In: Dinh Phung; Vincent S. Tseng; Geoffrey I. Webb; Bao Ho; Mohadeseh Ganji; Lida Rashidi (Ed.), Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part I. Paper presented at 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018), Melbourne, Australia, June 3-6, 2018 (pp. 94-105). Cham: Springer Nature
Open this publication in new window or tab >>Classification With Reject Option Using Conformal Prediction
2018 (English)In: Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part I / [ed] Dinh Phung; Vincent S. Tseng; Geoffrey I. Webb; Bao Ho; Mohadeseh Ganji; Lida Rashidi, Cham: Springer Nature, 2018, p. 94-105Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we propose a practically useful means of interpreting the predictions produced by a conformal classifier. The proposed interpretation leads to a classifier with a reject option, that allows the user to limit the number of erroneous predictions made on the test set, without any need to reveal the true labels of the test objects. The method described in this paper works by estimating the cumulative error count on a set of predictions provided by a conformal classifier, ordered by their confidence. Given a test set and a user-specified parameter k, the proposed classification procedure outputs the largest possible amount of predictions containing on average at most k errors, while refusing to make predictions for test objects where it is too uncertain. We conduct an empirical evaluation using benchmark datasets, and show that we are able to provide accurate estimates for the error rate on the test set.

Place, publisher, year, edition, pages
Cham: Springer Nature, 2018
Series
Lecture Notes in Artificial Intelligence, ISSN 0302-9743, E-ISSN 1611-3349 ; 10937
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:su:diva-192611 (URN)10.1007/978-3-319-93034-3_8 (DOI)000443224400008 ()2-s2.0-85049360232 (Scopus ID)978-3-319-93033-6 (ISBN)978-3-319-93034-3 (ISBN)
Conference
22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018), Melbourne, Australia, June 3-6, 2018
Available from: 2021-04-25 Created: 2021-04-25 Last updated: 2023-10-31Bibliographically approved
Gurung, R. B., Lindgren, T. & Boström, H. (2018). Learning Random Forest from Histogram Data Using Split Specific Axis Rotation. International Journal of Machine Learning and Computing, 8(1), 74-79
Open this publication in new window or tab >>Learning Random Forest from Histogram Data Using Split Specific Axis Rotation
2018 (English)In: International Journal of Machine Learning and Computing, ISSN 2010-3700, Vol. 8, no 1, p. 74-79Article in journal (Refereed) Published
Abstract [en]

Machine learning algorithms for data containing histogram variables have not been explored to any major extent. In this paper, an adapted version of the random forest algorithm is proposed to handle variables of this type, assuming identical structure of the histograms across observations, i.e., the histograms for a variable all use the same number and width of the bins. The standard approach of representing bins as separate variables, may lead to that the learning algorithm overlooks the underlying dependencies. In contrast, the proposed algorithm handles each histogram as a unit. When performing split evaluation of a histogram variable during tree growth, a sliding window of fixed size is employed by the proposed algorithm to constrain the sets of bins that are considered together. A small number of all possible set of bins are randomly selected and principal component analysis (PCA) is applied locally on all examples in a node. Split evaluation is then performed on each principal component. Results from applying the algorithm to both synthetic and real world data are presented, showing that the proposed algorithm outperforms the standard approach of using random forests together with bins represented as separate variables, with respect to both AUC and accuracy. In addition to introducing the new algorithm, we elaborate on how real world data for predicting NOx sensor failure in heavy duty trucks was prepared, demonstrating that predictive performance can be further improved by adding variables that represent changes of the histograms over time.

Keywords
Histogram random forest, histogram data, random forest PCA. histogram features.
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-156827 (URN)10.18178/ijmlc.2018.8.1.666 (DOI)
Available from: 2018-05-30 Created: 2018-05-30 Last updated: 2022-02-26Bibliographically approved
Boström, H., Linusson, H., Lofstrom, T. & Johansson, U. (2017). Accelerating difficulty estimation for conformal regression forests. Paper presented at 5th Symposium on Conformal and Probabilistic Prediction with Applications (COPA), Madrid, Spain, April 20-22, 2016.. Annals of Mathematics and Artificial Intelligence, 81(1-2), 125-144
Open this publication in new window or tab >>Accelerating difficulty estimation for conformal regression forests
2017 (English)In: Annals of Mathematics and Artificial Intelligence, ISSN 1012-2443, E-ISSN 1573-7470, Vol. 81, no 1-2, p. 125-144Article in journal (Refereed) Published
Abstract [en]

The conformal prediction framework allows for specifying the probability of making incorrect predictions by a user-provided confidence level. In addition to a learning algorithm, the framework requires a real-valued function, called nonconformity measure, to be specified. The nonconformity measure does not affect the error rate, but the resulting efficiency, i.e., the size of output prediction regions, may vary substantially. A recent large-scale empirical evaluation of conformal regression approaches showed that using random forests as the learning algorithm together with a nonconformity measure based on out-of-bag errors normalized using a nearest-neighbor-based difficulty estimate, resulted in state-of-the-art performance with respect to efficiency. However, the nearest-neighbor procedure incurs a significant computational cost. In this study, a more straightforward nonconformity measure is investigated, where the difficulty estimate employed for normalization is based on the variance of the predictions made by the trees in a forest. A large-scale empirical evaluation is presented, showing that both the nearest-neighbor-based and the variance-based measures significantly outperform a standard (non-normalized) nonconformity measure, while no significant difference in efficiency between the two normalized approaches is observed. The evaluation moreover shows that the computational cost of the variance-based measure is several orders of magnitude lower than when employing the nearest-neighbor-based nonconformity measure. The use of out-of-bag instances for calibration does, however, result in nonconformity scores that are distributed differently from those obtained from test instances, questioning the validity of the approach. An adjustment of the variance-based measure is presented, which is shown to be valid and also to have a significant positive effect on the efficiency. For conformal regression forests, the variance-based nonconformity measure is hence a computationally efficient and theoretically well-founded alternative to the nearest-neighbor procedure.

Keywords
Conformal prediction, Nonconformity measures, Regression, Random forests
National Category
Computer and Information Sciences Mathematics
Identifiers
urn:nbn:se:su:diva-146954 (URN)10.1007/s10472-017-9539-9 (DOI)000407425000008 ()
Conference
5th Symposium on Conformal and Probabilistic Prediction with Applications (COPA), Madrid, Spain, April 20-22, 2016.
Available from: 2017-09-19 Created: 2017-09-19 Last updated: 2022-03-23Bibliographically approved
Boström, H., Asker, L., Gurung, R. B., Karlsson, I., Lindgren, T. & Papapetrou, P. (2017). Conformal prediction using random survival forests. In: Xuewen Chen, Bo Luo, Feng Luo, Vasile Palade, M. Arif Wani (Ed.), 16th IEEE International Conference on Machine Learning and Applications: Proceedings. Paper presented at 16th IEEE International Conference On Machine Learning And Applications, Cancun, Mexico, December 18-21, 2017 (pp. 812-817). IEEE
Open this publication in new window or tab >>Conformal prediction using random survival forests
Show others...
2017 (English)In: 16th IEEE International Conference on Machine Learning and Applications: Proceedings / [ed] Xuewen Chen, Bo Luo, Feng Luo, Vasile Palade, M. Arif Wani, IEEE, 2017, p. 812-817Conference paper, Published paper (Refereed)
Abstract [en]

Random survival forests constitute a robust approach to survival modeling, i.e., predicting the probability that an event will occur before or on a given point in time. Similar to most standard predictive models, no guarantee for the prediction error is provided for this model, which instead typically is empirically evaluated. Conformal prediction is a rather recent framework, which allows the error of a model to be determined by a user specified confidence level, something which is achieved by considering set rather than point predictions. The framework, which has been applied to some of the most popular classification and regression techniques, is here for the first time applied to survival modeling, through random survival forests. An empirical investigation is presented where the technique is evaluated on datasets from two real-world applications; predicting component failure in trucks using operational data and predicting survival and treatment of heart failure patients from administrative healthcare data. The experimental results show that the error levels indeed are very close to the provided confidence levels, as guaranteed by the conformal prediction framework, and that the error for predicting each outcome, i.e., event or no-event, can be controlled separately. The latter may, however, lead to less informative predictions, i.e., larger prediction sets, in case the class distribution is heavily imbalanced.

Place, publisher, year, edition, pages
IEEE, 2017
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-149417 (URN)10.1109/ICMLA.2017.00-57 (DOI)000425853000130 ()978-1-5386-1418-1 (ISBN)
Conference
16th IEEE International Conference On Machine Learning And Applications, Cancun, Mexico, December 18-21, 2017
Available from: 2017-11-30 Created: 2017-11-30 Last updated: 2022-02-28Bibliographically approved
Rebane, J., Karlsson, I., Asker, L., Boström, H. & Papapetrou, P. (2017). Learning from Administrative Health Registries. In: Ricard Gavaldà, Irena Koprinska, Stefan Kramer (Ed.), SoGood 2017: Data Science for Social Good: Proceedings. Paper presented at Second Workshop on Data Science for Social Good co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Dicovery in Databases (ECML-PKDD 2017), Skopje, Macedonia, September 18, 2017. CEUR-WS.org
Open this publication in new window or tab >>Learning from Administrative Health Registries
Show others...
2017 (English)In: SoGood 2017: Data Science for Social Good: Proceedings / [ed] Ricard Gavaldà, Irena Koprinska, Stefan Kramer, CEUR-WS.org , 2017Conference paper, Published paper (Refereed)
Abstract [en]

Over the last decades the healthcare domain has seen a tremendous increase and interest in methods for making inference about patient care using large quantities of medical data. Such data is often stored in electronic health records and administrative health registries. As these data sources have grown increasingly complex, with millions of patients represented by thousands of attributes, static or time evolving, finding relevant and accurate patterns that can be used for predictive or descriptive modelling is impractical for human experts. In this paper, we concentrate our review on Swedish Administrative Health Registries (AHRs) and Electronic Health Records (EHRs) and provide an overview of recent and ongoing work in the area with focus on adverse drug events (ADEs) and heart failure.

Place, publisher, year, edition, pages
CEUR-WS.org, 2017
Series
CEUR Workshop Proceedings, E-ISSN 1613-0073 ; 1960
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-149269 (URN)
Conference
Second Workshop on Data Science for Social Good co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Dicovery in Databases (ECML-PKDD 2017), Skopje, Macedonia, September 18, 2017
Available from: 2017-11-24 Created: 2017-11-24 Last updated: 2022-02-28Bibliographically approved
Zhao, J., Papapetrou, P., Asker, L. & Boström, H. (2017). Learning from heterogeneous temporal data from electronic health records. Journal of Biomedical Informatics, 65, 105-119
Open this publication in new window or tab >>Learning from heterogeneous temporal data from electronic health records
2017 (English)In: Journal of Biomedical Informatics, ISSN 1532-0464, E-ISSN 1532-0480, Vol. 65, p. 105-119Article in journal (Refereed) Published
Abstract [en]

Electronic health records contain large amounts of longitudinal data that are valuable for biomedical informatics research. The application of machine learning is a promising alternative to manual analysis of such data. However, the complex structure of the data, which includes clinical events that are unevenly distributed over time, poses a challenge for standard learning algorithms. Some approaches to modeling temporal data rely on extracting single values from time series; however, this leads to the loss of potentially valuable sequential information. How to better account for the temporality of clinical data, hence, remains an important research question. In this study, novel representations of temporal data in electronic health records are explored. These representations retain the sequential information, and are directly compatible with standard machine learning algorithms. The explored methods are based on symbolic sequence representations of time series data, which are utilized in a number of different ways. An empirical investigation, using 19 datasets comprising clinical measurements observed over time from a real database of electronic health records, shows that using a distance measure to random subsequences leads to substantial improvements in predictive performance compared to using the original sequences or clustering the sequences. Evidence is moreover provided on the quality of the symbolic sequence representation by comparing it to sequences that are generated using domain knowledge by clinical experts. The proposed method creates representations that better account for the temporality of clinical events, which is often key to prediction tasks in the biomedical domain.

Keywords
random subsequence, time series classification, electronic health records, data mining, machine learning
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-137481 (URN)10.1016/j.jbi.2016.11.006 (DOI)000406235200008 ()
Available from: 2017-01-08 Created: 2017-01-08 Last updated: 2022-03-23Bibliographically approved
Karlsson, I., Papapetrou, P., Asker, L., Boström, H. & Persson, H. E. (2017). Mining disproportional itemsets for characterizing groups of heart failure patients from administrative health records. In: Proceedings of the 10th International Conference on PErvasive Technologies Related to Assistive Environments: . Paper presented at 10th International Conference on PErvasive Technologies Related to Assistive Environments, Island of Rhodes, Greece, June 21 - 23, 2017 (pp. 394-398). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Mining disproportional itemsets for characterizing groups of heart failure patients from administrative health records
Show others...
2017 (English)In: Proceedings of the 10th International Conference on PErvasive Technologies Related to Assistive Environments, Association for Computing Machinery (ACM), 2017, p. 394-398Conference paper, Published paper (Refereed)
Abstract [en]

Heart failure is a serious medical conditions involving decreased quality of life and an increased risk of premature death. A recent evaluation by the Swedish National Board of Health and Welfare shows that Swedish heart failure patients are often undertreated and do not receive basic medication as recommended by the national guidelines for treatment of heart failure. The objective of this paper is to use registry data to characterize groups of heart failure patients, with an emphasis on basic treatment. Towards this end, we explore the applicability of frequent itemset mining and disproportionality analysis for finding interesting and distinctive characterizations of a target group of patients, e.g., those who have received basic treatment, against a control group, e.g., those who have not received basic treatment. Our empirical evaluation is performed on data extracted from administrative health records from the Stockholm County covering the years 2010--2016. Our findings suggest that frequency is not always the most appropriate measure of importance for frequent itemsets, while itemset disproportionality against a control group provides alternative rankings of the extracted itemsets leading to some medically intuitive characterizations of the target groups.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2017
Keywords
frequent itemsets, disproportionality analysis, heart failure
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-149270 (URN)10.1145/3056540.3076177 (DOI)978-1-4503-5227-7 (ISBN)
Conference
10th International Conference on PErvasive Technologies Related to Assistive Environments, Island of Rhodes, Greece, June 21 - 23, 2017
Available from: 2017-11-24 Created: 2017-11-24 Last updated: 2022-02-28Bibliographically approved
Linusson, H., Norinder, U., Boström, H., Johansson, U. & Löfström, T. (2017). On the Calibration of Aggregated Conformal Predictors. In: Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, Harris Papadopoulos (Ed.), Proceedings of Machine Learning Research: Volume 60: Conformal and Probabilistic Prediction and Applications, 13-16 June 2017, Stockholm, Sweden. Paper presented at Conformal and Probabilistic Prediction and ApplicationsVolume 60: Conformal and Probabilistic Prediction and Applications, 13-16 June 2017, Stockholm, Sweden (pp. 154-173). , 60
Open this publication in new window or tab >>On the Calibration of Aggregated Conformal Predictors
Show others...
2017 (English)In: Proceedings of Machine Learning Research: Volume 60: Conformal and Probabilistic Prediction and Applications, 13-16 June 2017, Stockholm, Sweden / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, Harris Papadopoulos, 2017, Vol. 60, p. 154-173Conference paper, Published paper (Refereed)
Abstract [en]

Conformal prediction is a learning framework that produces models that associate with each of their predictions a measure of statistically valid confidence. These models are typically constructed on top of traditional machine learning algorithms. An important result of conformal prediction theory is that the models produced are provably valid under relatively weak assumptions—in particular, their validity is independent of the specific underlying learning algorithm on which they are based. Since validity is automatic, much research on conformal predictors has been focused on improving their informational and computational efficiency. As part of the efforts in constructing efficient conformal predictors, aggregated conformal predictors were developed, drawing inspiration from the field of classification and regression ensembles. Unlike early definitions of conformal prediction procedures, the validity of aggregated conformal predictors is not fully understood—while it has been shown that they might attain empirical exact validity under certain circumstances, their theoretical validity is conditional on additional assumptions that require further clarification. In this paper, we show why validity is not automatic for aggregated conformal predictors, and provide a revised definition of aggregated conformal predictors that gains approximate validity conditional on properties of the underlying learning algorithm.

Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 60
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:su:diva-192606 (URN)
Conference
Conformal and Probabilistic Prediction and ApplicationsVolume 60: Conformal and Probabilistic Prediction and Applications, 13-16 June 2017, Stockholm, Sweden
Available from: 2021-04-25 Created: 2021-04-25 Last updated: 2022-02-25Bibliographically approved
Gurung, R. B., Lindgren, T. & Boström, H. (2017). Predicting NOx sensor failure in heavy duty trucks using histogram-based random forests. International Journal of Prognostics and Health Management, 8(1), Article ID 008.
Open this publication in new window or tab >>Predicting NOx sensor failure in heavy duty trucks using histogram-based random forests
2017 (English)In: International Journal of Prognostics and Health Management, E-ISSN 2153-2648, Vol. 8, no 1, article id 008Article in journal (Refereed) Published
Abstract [en]

Being able to accurately predict the impending failures of truck components is often associated with significant amount of cost savings, customer satisfaction and flexibility in maintenance service plans. However, because of the diversity in the way trucks typically are configured and their usage under different conditions, the creation of accurate prediction models is not an easy task. This paper describes an effort in creating such a prediction model for the NOx sensor, i.e., a component measuring the emitted level of nitrogen oxide in the exhaust of the engine. This component was chosen because it is vital for the truck to function properly, while at the same time being very fragile and costly to repair. As input to the model, technical specifications of trucks and their operational data are used. The process of collecting the data and making it ready for training the model via a slightly modified Random Forest learning algorithm is described along with various challenges encountered during this process. The operational data consists of features represented as histograms, posing an additional challenge for the data analysis task. In the study, a modified version of the random forest algorithm is employed, which exploits the fact that the individual bins in the histograms are related, in contrast to the standard approach that would consider the bins as independent features. Experiments are conducted using the updated random forest algorithm, and they clearly show that the modified version is indeed beneficial when compared to the standard random forest algorithm. The performance of the resulting prediction model for the NOx sensor is promising and may be adopted for the benefit of operators of heavy trucks.

Keywords
Histogram Features, NOx sensor prognostics, Histogram-based random forest
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-149432 (URN)10.36001/ijphm.2017.v8i1.2535 (DOI)
Available from: 2017-11-30 Created: 2017-11-30 Last updated: 2023-07-24Bibliographically approved
Organisations

Search in DiVA

Show all publications