Identifying adverse drug event information in clinical notes with distributional semantic representations of context
2015 (English)In: Journal of Biomedical Informatics, ISSN 1532-0464, E-ISSN 1532-0480, Vol. 57, 333-349 p.Article in journal (Refereed) Published
For the purpose of post-marketing drug safety surveillance, which has traditionally relied on the volun- tary reporting of individual cases of adverse drug events (ADEs), other sources of information are now being explored, including electronic health records (EHRs), which give us access to enormous amounts of longitudinal observations of the treatment of patients and their drug use. Adverse drug events, which can be encoded in EHRs with certain diagnosis codes, are, however, heavily underreported. It is therefore important to develop capabilities to process, by means of computational methods, the more unstructured EHR data in the form of clinical notes, where clinicians may describe and reason around suspected ADEs. In this study, we report on the creation of an annotated corpus of Swedish health records for the purpose of learning to identify information pertaining to ADEs present in clinical notes. To this end, three key tasks are tackled: recognizing relevant named entities (disorders, symptoms, drugs), labeling attributes of the recognized entities (negation, speculation, temporality), and relationships between them (indication, adverse drug event). For each of the three tasks, leveraging models of distributional semantics – i.e., unsupervised methods that exploit co-occurrence information to model, typically in vector space, the meaning of words – and, in particular, combinations of such models, is shown to improve the predictive performance. The ability to make use of such unsupervised methods is critical when faced with large amounts of sparse and high-dimensional data, especially in domains where annotated resources are scarce.
Place, publisher, year, edition, pages
2015. Vol. 57, 333-349 p.
adverse drug events, electronic health records, corpus annotation, machine learning, distributional semantics, relation extraction
Computer Science Language Technology (Computational Linguistics)
Research subject Computer and Systems Sciences
IdentifiersURN: urn:nbn:se:su:diva-122464DOI: 10.1016/j.jbi.2015.08.013ISI: 000363437500028OAI: oai:DiVA.org:su-122464DiVA: diva2:866463
ProjectsHigh-Performance Data Mining for Drug Effect Detection
FunderSwedish Foundation for Strategic Research , IIS11-0053