Change search
ReferencesLink to record
Permanent link

Direct link
Learning from heterogeneous temporal data from electronic health records
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2017 (English)In: Journal of Biomedical Informatics, ISSN 1532-0464, E-ISSN 1532-0480, Vol. 65, 105-119 p.Article in journal (Refereed) Published
Abstract [en]

Electronic health records contain large amounts of longitudinal data that are valuable for biomedical informatics research. The application of machine learning is a promising alternative to manual analysis of such data. However, the complex structure of the data, which includes clinical events that are unevenly distributed over time, poses a challenge for standard learning algorithms. Some approaches to modeling temporal data rely on extracting single values from time series; however, this leads to the loss of potentially valuable sequential information. How to better account for the temporality of clinical data, hence, remains an important research question. In this study, novel representations of temporal data in electronic health records are explored. These representations retain the sequential information, and are directly compatible with standard machine learning algorithms. The explored methods are based on symbolic sequence representations of time series data, which are utilized in a number of different ways. An empirical investigation, using 19 datasets comprising clinical measurements observed over time from a real database of electronic health records, shows that using a distance measure to random subsequences leads to substantial improvements in predictive performance compared to using the original sequences or clustering the sequences. Evidence is moreover provided on the quality of the symbolic sequence representation by comparing it to sequences that are generated using domain knowledge by clinical experts. The proposed method creates representations that better account for the temporality of clinical events, which is often key to prediction tasks in the biomedical domain.

Place, publisher, year, edition, pages
2017. Vol. 65, 105-119 p.
Keyword [en]
random subsequence, time series classification, electronic health records, data mining, machine learning
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-137481DOI: 10.1016/j.jbi.2016.11.006OAI: oai:DiVA.org:su-137481DiVA: diva2:1062756
Available from: 2017-01-08 Created: 2017-01-08 Last updated: 2017-01-23Bibliographically approved
In thesis
1.
The record could not be found. The reason may be that the record is no longer available or you may have typed in a wrong id in the address field.

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Zhao, JingPapapetrou, PanagiotisAsker, LarsBoström, Henrik
By organisation
Department of Computer and Systems Sciences
In the same journal
Journal of Biomedical Informatics
Information Systems

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

ReferencesLink to record
Permanent link

Direct link