Open this publication in new window or tab >>2015 (English)In: Journal of Biomedical Informatics, ISSN 1532-0464, E-ISSN 1532-0480, Vol. 57, p. 333-349Article in journal (Refereed) Published
Abstract [en]
For the purpose of post-marketing drug safety surveillance, which has traditionally relied on the volun- tary reporting of individual cases of adverse drug events (ADEs), other sources of information are now being explored, including electronic health records (EHRs), which give us access to enormous amounts of longitudinal observations of the treatment of patients and their drug use. Adverse drug events, which can be encoded in EHRs with certain diagnosis codes, are, however, heavily underreported. It is therefore important to develop capabilities to process, by means of computational methods, the more unstructured EHR data in the form of clinical notes, where clinicians may describe and reason around suspected ADEs. In this study, we report on the creation of an annotated corpus of Swedish health records for the purpose of learning to identify information pertaining to ADEs present in clinical notes. To this end, three key tasks are tackled: recognizing relevant named entities (disorders, symptoms, drugs), labeling attributes of the recognized entities (negation, speculation, temporality), and relationships between them (indication, adverse drug event). For each of the three tasks, leveraging models of distributional semantics – i.e., unsupervised methods that exploit co-occurrence information to model, typically in vector space, the meaning of words – and, in particular, combinations of such models, is shown to improve the predictive performance. The ability to make use of such unsupervised methods is critical when faced with large amounts of sparse and high-dimensional data, especially in domains where annotated resources are scarce.
Keywords
adverse drug events, electronic health records, corpus annotation, machine learning, distributional semantics, relation extraction
National Category
Computer Sciences Natural Language Processing
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-122464 (URN)10.1016/j.jbi.2015.08.013 (DOI)000363437500028 ()
Projects
High-Performance Data Mining for Drug Effect Detection
Funder
Swedish Foundation for Strategic Research , IIS11-0053
2015-11-022015-11-022025-02-01Bibliographically approved