Change search
Link to record
Permanent link

Direct link
Publications (10 of 35) Show all publications
Henriksson, A., Kvist, M. & Dalianis, H. (2017). Detecting Protected Health Information in Heterogeneous Clinical Notes. In: Adi V. Gundlapalli, Marie-Christine Jaulent, Dongsheng Zhao (Ed.), MEDINFO 2017: Precision Healthcare through Informatics. Paper presented at 16th World Congress of Medical and Health Informatics (MedInfo2017), Hangzhou, China, August 21-25, 2017 (pp. 393-397). IOS Press
Open this publication in new window or tab >>Detecting Protected Health Information in Heterogeneous Clinical Notes
2017 (English)In: MEDINFO 2017: Precision Healthcare through Informatics / [ed] Adi V. Gundlapalli, Marie-Christine Jaulent, Dongsheng Zhao, IOS Press, 2017, p. 393-397Conference paper, Published paper (Refereed)
Abstract [en]

To enable secondary use of healthcare data in a privacy-preserving manner, there is a need for methods capable of automatically identifying protected health information (PHI) in clinical text. To that end, learning predictive models from labeled examples has emerged as a promising alternative to rule-based systems. However, little is known about differences with respect to PHI prevalence in different types of clinical notes and how potential domain differences may affect the performance of predictive models trained on one particular type of note and applied to another. In this study, we analyze the performance of a predictive model trained on an existing PHI corpus of Swedish clinical notes and applied to a variety of clinical notes: written (i) in different clinical specialties, (ii) under different headings, and (iii) by persons in different professions. The results indicate that domain adaption is needed for effective detection of PHI in heterogeneous clinical notes.

Place, publisher, year, edition, pages
IOS Press, 2017
Series
Studies in Health Technology and Informatics, ISSN 0926-9630, E-ISSN 1879-8365 ; 245
Keywords
Data Anonymization, Electronic Health Records, Natural Language Processing
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-150179 (URN)10.3233/978-1-61499-830-3-393 (DOI)000449471200082 ()978-1-61499-829-7 (ISBN)978-1-61499-830-3 (ISBN)
Conference
16th World Congress of Medical and Health Informatics (MedInfo2017), Hangzhou, China, August 21-25, 2017
Available from: 2017-12-13 Created: 2017-12-13 Last updated: 2022-02-28Bibliographically approved
Henriksson, A., Kvist, M. & Dalianis, H. (2017). Prevalence Estimation of Protected Health Information in Swedish Clinical Text. In: Rebecca Randell, Ronald Cornet, Colin McCowan, Niels Peek, Philip J. Scott (Ed.), Informatics for Health: Connected Citizen-Led Wellness and Population Health. Paper presented at The Medical Informatics Europe (MIE) Conference, Manchester, UK, 24-26 April, 2017 (pp. 216-220). IOS Press
Open this publication in new window or tab >>Prevalence Estimation of Protected Health Information in Swedish Clinical Text
2017 (English)In: Informatics for Health: Connected Citizen-Led Wellness and Population Health / [ed] Rebecca Randell, Ronald Cornet, Colin McCowan, Niels Peek, Philip J. Scott, IOS Press, 2017, p. 216-220Conference paper, Published paper (Refereed)
Abstract [en]

Obscuring protected health information (PHI) in the clinical text of health records facilitates the secondary use of healthcare data in a privacy-preserving manner. Although automatic de-identification of clinical text using machine learning holds much promise, little is known about the relative prevalence of PHI in different types of clinical text and whether there is a need for domain adaptation when learning predictive models from one particular domain and applying it to another. In this study, we address these questions by training a predictive model and using it to estimate the prevalence of PHI in clinical text written (1) in different clinical specialties, (2) in different types of notes (i.e., under different headings), and (3) by persons in different professional roles. It is demonstrated that the overall PHI density is 1.57%; however, substantial differences exist across domains.

Place, publisher, year, edition, pages
IOS Press, 2017
Series
Studies in Health Technology and Informatics, ISSN 0926-9630, E-ISSN 1879-8365 ; 235
Keywords
electronic health records, protected health information, de-identification, natural language processing, predictive modeling
National Category
Natural Language Processing
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-149433 (URN)10.3233/978-1-61499-753-5-216 (DOI)978-1-61499-752-8 (ISBN)978-1-61499-753-5 (ISBN)
Conference
The Medical Informatics Europe (MIE) Conference, Manchester, UK, 24-26 April, 2017
Available from: 2017-11-30 Created: 2017-11-30 Last updated: 2025-02-07Bibliographically approved
Grigonyte, G., Kvist, M., Wirén, M., Velupillai, S. & Henriksson, A. (2016). Swedification patterns of Latin and Greek affixes in clinical text. Nordic Journal of Linguistics, 39(1), 5-37
Open this publication in new window or tab >>Swedification patterns of Latin and Greek affixes in clinical text
Show others...
2016 (English)In: Nordic Journal of Linguistics, ISSN 0332-5865, E-ISSN 1502-4717, Vol. 39, no 1, p. 5-37Article in journal (Refereed) Published
Abstract [en]

Swedish medical language is rich with Latin and Greek terminology which has undergone a Swedification since the 1980s. However, many original expressions are still used by clinical professionals. The goal of this study is to obtain precise quantitative measures of how the foreign terminology is manifested in Swedish clinical text. To this end, we explore the use of Latin and Greek affixes in Swedish medical texts in three genres: clinical text, scientific medical text and online medical information for laypersons. More specifically, we use frequency lists derived from tokenised Swedish medical corpora in the three domains, and extract word pairs belonging to types that display both the original and Swedified spellings. We describe six distinct patterns explaining the variation in the usage of Latin and Greek affixes in clinical text. The results show that to a large extent affixes in clinical text are Swedified and that prefixes are used more conservatively than suffixes.

Keywords
affixes, clinical text, corpus linguistics, health records, Latin and Greek terminology
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:su:diva-129031 (URN)10.1017/S0332586515000293 (DOI)000374241300001 ()
Available from: 2016-04-13 Created: 2016-04-13 Last updated: 2022-02-23Bibliographically approved
Velupillai, S., Weegar, R. & Kvist, M. (2016). Temporal Annotation of Swedish Intensive Care Notes. In: : . Paper presented at AMIA 2016 Annual Symposium, Chicago, USA, November 12 - 16, 2016.
Open this publication in new window or tab >>Temporal Annotation of Swedish Intensive Care Notes
2016 (English)Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

We describe the creation of a corpus of Swedish intensive care unit (ICU) notes annotated for temporal expressions. Clinical notes from an ICU in Stockholm, Sweden were used. The HeidelTime system was adapted to develop Swedish clinical time expression (TIMEX3) resources. Overall micro-average Inter-Annotator Agreement is high (86% F1). We have created Swedish lexical resources with clinically specific time expressions that will be useful for the development of a Swedish clinical text temporal reasoning system.

Keywords
annotations, temporal expresssions
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-136636 (URN)
Conference
AMIA 2016 Annual Symposium, Chicago, USA, November 12 - 16, 2016
Available from: 2016-12-12 Created: 2016-12-12 Last updated: 2022-02-28Bibliographically approved
Weegar, R., Kvist, M., Sundström, K., Brunak, S. & Dalianis, H. (2015). Finding Cervical Cancer Symptoms in Swedish Clinical Text using a Machine Learning Approach and NegEx. In: AMIA Annual Symposium Proceedings: . Paper presented at AMIA 2015 Annual Symposium, San Francisco, CA, November 14 - 18, 2015 (pp. 1296-1305). American Medical Informatics Association
Open this publication in new window or tab >>Finding Cervical Cancer Symptoms in Swedish Clinical Text using a Machine Learning Approach and NegEx
Show others...
2015 (English)In: AMIA Annual Symposium Proceedings, American Medical Informatics Association , 2015, p. 1296-1305Conference paper, Published paper (Refereed)
Abstract [en]

Detection of early symptoms in cervical cancer is crucial for early treatment and survival. To find symptoms of cervical cancer in clinical text, Named Entity Recognition is needed. In this paper the Clinical Entity Finder, a machine-learning tool trained on annotated clinical text from a Swedish internal medicine emergency unit, is evaluated on cervical cancer records. The Clinical Entity Finder identifies entities of the types body part, finding and disorder and is extended with negation detection using the rule-based tool NegEx, to distinguish between negated and non-negated entities. To measure the performance of the tools on this new domain, two physicians annotated a set of clinical notes from the health records of cervical cancer patients. The inter-annotator agreement for finding, disorder and body part obtained an average F-score of 0.677 and the Clinical Entity Finder extended with NegEx had an average F-score of 0.667.

Place, publisher, year, edition, pages
American Medical Informatics Association, 2015
Series
AMIA Annual Symposium Proceedings, ISSN 1559-4076, E-ISSN 1942-597X
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-123947 (URN)26958270 (PubMedID)
Conference
AMIA 2015 Annual Symposium, San Francisco, CA, November 14 - 18, 2015
Available from: 2015-12-09 Created: 2015-12-09 Last updated: 2022-02-23Bibliographically approved
Zhao, J., Henriksson, A., Kvist, M., Asker, L. & Boström, H. (2015). Handling Temporality of Clinical Events for Drug Safety Surveillance. AMIA Annual Symposium Proceedings, 2015, 1371-1380
Open this publication in new window or tab >>Handling Temporality of Clinical Events for Drug Safety Surveillance
Show others...
2015 (English)In: AMIA Annual Symposium Proceedings, ISSN 1559-4076, Vol. 2015, p. 1371-1380Article in journal (Refereed) Published
Abstract [en]

Using longitudinal data in electronic health records (EHRs) for post-marketing adverse drug event (ADE) detection allows for monitoring patients throughout their medical history. Machine learning methods have been shown to be efficient and effective in screening health records and detecting ADEs. How best to exploit historical data, as encoded by clinical events in EHRs is, however, not very well understood. In this study, three strategies for handling temporality of clinical events are proposed and evaluated using an EHR database from Stockholm, Sweden. The random forest learning algorithm is applied to predict fourteen ADEs using clinical events collected from different lengths of patient history. The results show that, in general, including longer patient history leads to improved predictive performance, and that assigning weights to events according to time distance from the ADE yields the biggest improvement.

Keywords
drug safety surveillance, pharmacovigilance, adverse drug events, electronic health records, temporality, predictive modeling
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-123950 (URN)
Available from: 2015-12-09 Created: 2015-12-09 Last updated: 2022-02-23Bibliographically approved
Dalianis, H., Henriksson, A., Kvist, M., Velupillai, S. & Weegar, R. (2015). HEALTH BANK - A Workbench for Data Science Applications in Healthcare. In: Industry Track Workshop: . Paper presented at CAiSE Industry Track, CAiSE-IT 2015 - co-located with 27th Conference on Advanced Information Systems Engineering, CAiSE 2015; Stockholm; Sweden; 11 June 2015 through ; Code 112715 -------------------------------------------------------------------------------- CAiSE Industry Track, CAiSE-IT 2015 - co-located with 27th Conference on Advanced Information Systems Engineering, CAiSE 2015; Stockholm; Sweden; 11 June 2015 through. (pp. 1-18). CEUR Workshop Proceedings, 1381
Open this publication in new window or tab >>HEALTH BANK - A Workbench for Data Science Applications in Healthcare
Show others...
2015 (English)In: Industry Track Workshop, CEUR Workshop Proceedings , 2015, Vol. 1381, p. 1-18Conference paper, Published paper (Refereed)
Abstract [en]

The enormous amounts of data that are generated in the healthcare process and stored in electronic health record (EHR) systems are an underutilized resource that, with the use of data science applica- tions, can be exploited to improve healthcare. To foster the development and use of data science applications in healthcare, there is a fundamen- tal need for access to EHR data, which is typically not readily available to researchers and developers. A relatively rare exception is the large EHR database, the Stockholm EPR Corpus, comprising data from more than two million patients, that has been been made available to a lim- ited group of researchers at Stockholm University. Here, we describe a number of data science applications that have been developed using this database, demonstrating the potential reuse of EHR data to support healthcare and public health activities, as well as facilitate medical re- search. However, in order to realize the full potential of this resource, it needs to be made available to a larger community of researchers, as well as to industry actors. To that end, we envision the provision of an in- frastructure around this database called HEALTH BANK – the Swedish Health Record Research Bank. It will function both as a workbench for the development of data science applications and as a data explo- ration tool, allowing epidemiologists, pharmacologists and other medical researchers to generate and evaluate hypotheses. Aggregated data will be fed into a pipeline for open e-access, while non-aggregated data will be provided to researchers within an ethical permission framework. We believe that HEALTH BANK has the potential to promote a growing industry around the development of data science applications that will ultimately increase the efficiency and effectiveness of healthcare.

Place, publisher, year, edition, pages
CEUR Workshop Proceedings, 2015
Series
CEUR Workshop Proceedings, ISSN 1613-0073 ; 1381
Keywords
electronic health record, data science, health intelligence, infrastructure, data mining, text mining, predictive modeling, clinical text, health bank, health record research
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-122827 (URN)
Conference
CAiSE Industry Track, CAiSE-IT 2015 - co-located with 27th Conference on Advanced Information Systems Engineering, CAiSE 2015; Stockholm; Sweden; 11 June 2015 through ; Code 112715 -------------------------------------------------------------------------------- CAiSE Industry Track, CAiSE-IT 2015 - co-located with 27th Conference on Advanced Information Systems Engineering, CAiSE 2015; Stockholm; Sweden; 11 June 2015 through.
Available from: 2015-11-11 Created: 2015-11-10 Last updated: 2022-02-23Bibliographically approved
Henriksson, A., Kvist, M., Dalianis, H. & Duneld, M. (2015). Identifying adverse drug event information in clinical notes with distributional semantic representations of context. Journal of Biomedical Informatics, 57, 333-349
Open this publication in new window or tab >>Identifying adverse drug event information in clinical notes with distributional semantic representations of context
2015 (English)In: Journal of Biomedical Informatics, ISSN 1532-0464, E-ISSN 1532-0480, Vol. 57, p. 333-349Article in journal (Refereed) Published
Abstract [en]

For the purpose of post-marketing drug safety surveillance, which has traditionally relied on the volun- tary reporting of individual cases of adverse drug events (ADEs), other sources of information are now being explored, including electronic health records (EHRs), which give us access to enormous amounts of longitudinal observations of the treatment of patients and their drug use. Adverse drug events, which can be encoded in EHRs with certain diagnosis codes, are, however, heavily underreported. It is therefore important to develop capabilities to process, by means of computational methods, the more unstructured EHR data in the form of clinical notes, where clinicians may describe and reason around suspected ADEs. In this study, we report on the creation of an annotated corpus of Swedish health records for the purpose of learning to identify information pertaining to ADEs present in clinical notes. To this end, three key tasks are tackled: recognizing relevant named entities (disorders, symptoms, drugs), labeling attributes of the recognized entities (negation, speculation, temporality), and relationships between them (indication, adverse drug event). For each of the three tasks, leveraging models of distributional semantics – i.e., unsupervised methods that exploit co-occurrence information to model, typically in vector space, the meaning of words – and, in particular, combinations of such models, is shown to improve the predictive performance. The ability to make use of such unsupervised methods is critical when faced with large amounts of sparse and high-dimensional data, especially in domains where annotated resources are scarce.

Keywords
adverse drug events, electronic health records, corpus annotation, machine learning, distributional semantics, relation extraction
National Category
Computer Sciences Natural Language Processing
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-122464 (URN)10.1016/j.jbi.2015.08.013 (DOI)000363437500028 ()
Projects
High-Performance Data Mining for Drug Effect Detection
Funder
Swedish Foundation for Strategic Research , IIS11-0053
Available from: 2015-11-02 Created: 2015-11-02 Last updated: 2025-02-01Bibliographically approved
Velupillai, S., Duneld, M., Henriksson, A., Kvist, M., Skeppstedt, M. & Dalianis, H. (Eds.). (2015). Louhi 2014: Special issue on health text mining and information analysis. Paper presented at EACL 2014 Workshop - The Fifth International Workshop on Health Text Mining and Information Analysis, Gothenburg, Sweden, April 27, 2014. London: BioMed Central
Open this publication in new window or tab >>Louhi 2014: Special issue on health text mining and information analysis
Show others...
2015 (English)Conference proceedings (editor) (Refereed)
Place, publisher, year, edition, pages
London: BioMed Central, 2015
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-119911 (URN)
Conference
EACL 2014 Workshop - The Fifth International Workshop on Health Text Mining and Information Analysis, Gothenburg, Sweden, April 27, 2014
Note

Special Issue: BMC Medical Informatics and Decision Making, ISSN 1472-6947, Volume 15, Supplement 2.

Available from: 2015-11-11 Created: 2015-08-28 Last updated: 2022-02-23Bibliographically approved
Velupillai, S., Duneld, M., Henriksson, A., Kvist, M., Skeppstedt, M. & Dalianis, H. (2015). Louhi 2014: Special issue on health text mining and information analysis: introduction. Paper presented at Louhi 2014: The Fifth International Workshop on Health Text Mining and Information Analysis, Gothenburg, Sweden, April 27, 2014. BMC Medical Informatics and Decision Making, 2(SI), 1-3
Open this publication in new window or tab >>Louhi 2014: Special issue on health text mining and information analysis: introduction
Show others...
2015 (English)In: BMC Medical Informatics and Decision Making, E-ISSN 1472-6947, Vol. 2, no SI, p. 1-3Article in journal (Refereed) Published
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-119912 (URN)10.1186/1472-6947-15-S2-S1 (DOI)000367479700001 ()
Conference
Louhi 2014: The Fifth International Workshop on Health Text Mining and Information Analysis, Gothenburg, Sweden, April 27, 2014
Available from: 2015-11-11 Created: 2015-08-28 Last updated: 2022-05-10Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-5780-0063

Search in DiVA

Show all publications