Modeling Electronic Health Records in Ensembles of Semantic Spaces for Adverse Drug Event Detection
2015 (English)In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE conference proceedings, 2015, 343-350 p.Conference paper (Refereed)
Electronic health records (EHRs) are emerging as a potentially valuable source for pharmacovigilance; however, adverse drug events (ADEs), which can be encoded in EHRs by a set of diagnosis codes, are heavily underreported. Alerting systems, able to detect potential ADEs on the basis of patient- specific EHR data, would help to mitigate this problem. To that end, the use of machine learning has proven to be both efficient and effective; however, challenges remain in representing the heterogeneous EHR data, which moreover tends to be high- dimensional and exceedingly sparse, in a manner conducive to learning high-performing predictive models. Prior work has shown that distributional semantics – that is, natural language processing methods that, traditionally, model the meaning of words in semantic (vector) space on the basis of co-occurrence information – can be exploited to create effective representations of sequential EHR data, not only free-text in clinical notes but also various clinical events such as diagnoses, drugs and measurements. When modeling data in semantic space, an im- portant design decision concerns the size of the context window around an object of interest, which governs the scope of co- occurrence information that is taken into account and affects the composition of the resulting semantic space. Here, we report on experiments conducted on 27 clinical datasets, demonstrating that performance can be significantly improved by modeling EHR data in ensembles of semantic spaces, consisting of multiple semantic spaces built with different context window sizes. A follow-up investigation is conducted to study the impact on predictive performance as increasingly more semantic spaces are included in the ensemble, demonstrating that accuracy tends to improve with the number of semantic spaces, albeit not monotonically so. Finally, a number of different strategies for combining the semantic spaces are explored, demonstrating the advantage of early (feature) fusion over late (classifier) fusion. Ensembles of semantic spaces allow multiple views of (sparse) data to be captured (densely) and thereby enable improved performance to be obtained on the task of detecting ADEs in EHRs.
Place, publisher, year, edition, pages
IEEE conference proceedings, 2015. 343-350 p.
distributional semantics, semantic space ensembles, ensemble models, electronic health records, adverse drug events, predictive modeling, information fusion
Language Technology (Computational Linguistics) Computer Science
Research subject Computer and Systems Sciences
IdentifiersURN: urn:nbn:se:su:diva-122463DOI: 10.1109/BIBM.2015.7359705OAI: oai:DiVA.org:su-122463DiVA: diva2:866461
IEEE BIBM, International Conference on Bioinformatics and Biomedicine, 09-12 November 2015, U.S.A, Washington, D.C.
ProjectsHigh-Performance Data Mining for Drug Effect Detection
FunderSwedish Foundation for Strategic Research , IIS11-0053