Detecting Adverse Drug Events with Multiple Representations of Clinical Measurements
2014 (English)In: Proceedings 2014 IEEE International Conference on Bioinformatics and Biomedicine / [ed] Huiru Zeng et al, IEEE Computer Society, 2014, 536-543 p.Conference paper (Refereed)
Adverse drug events (ADEs) are grossly under-reported in electronic health records (EHRs). This could be mitigated by methods that are able to detect ADEs in EHRs, thereby allowing for missing ADE-specific diagnosis codes to be identified and added. A crucial aspect of constructing such systems is to find proper representations of the data in order to allow the predictive modeling to be as accurate as possible. One category of EHR data that can be used as indicators of ADEs are clinical measurements. However, using clinical measurements as features is not unproblematic due to the high rate of missing values and they can be repeated a variable number of times in each patient health record. In this study, five basic representations of clinical measurements are proposed and evaluated to handle these two problems. An empirical investigation using random forest on 27 datasets from a real EHR database with different ADE targets is presented, demonstrating that the predictive performance, in terms of accuracy and area under ROC curve, is higher when representing clinical measurements crudely as whether they were taken or how many times they were taken by a patient. Furthermore, a sixth alternative, combining all five basic representations, significantly outperforms using any of the basic representation except for one. A subsequent analysis of variable importance is also conducted with this fused feature set, showing that when clinical measurements have a high missing rate, the number of times they were taken by one patient is ranked as more informative than looking at their actual values. The observation from random forest is also confirmed empirically using other commonly employed classifiers. This study demonstrates that the way in which clinical measurements from EHRs are presented has a high impact for ADE detection, and that using multiple representations outperforms using a basic representation.
Place, publisher, year, edition, pages
IEEE Computer Society, 2014. 536-543 p.
Research subject Computer and Systems Sciences
IdentifiersURN: urn:nbn:se:su:diva-110970DOI: 10.1109/BIBM.2014.6999216ISBN: 978-1-4799-5669-2OAI: oai:DiVA.org:su-110970DiVA: diva2:773744
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference, 2-5 November, 2014, Belfast, UK