Change search
Link to record
Permanent link

Direct link
Publications (10 of 98) Show all publications
Valik, J. K., Ward, L., Tanushi, H., Johansson, A. F., Färnert, A., Mogensen, M. L., . . . Nauclér, P. (2023). Predicting sepsis onset using a machine learned causal probabilistic network algorithm based on electronic health records data. Scientific Reports, 13(1), Article ID 11760.
Open this publication in new window or tab >>Predicting sepsis onset using a machine learned causal probabilistic network algorithm based on electronic health records data
Show others...
2023 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 13, no 1, article id 11760Article in journal (Refereed) Published
Abstract [en]

Sepsis is a leading cause of mortality and early identification improves survival. With increasing digitalization of health care data automated sepsis prediction models hold promise to aid in prompt recognition. Most previous studies have focused on the intensive care unit (ICU) setting. Yet only a small proportion of sepsis develops in the ICU and there is an apparent clinical benefit to identify patients earlier in the disease trajectory. In this cohort of 82,852 hospital admissions and 8038 sepsis episodes classified according to the Sepsis-3 criteria, we demonstrate that a machine learned score can predict sepsis onset within 48 h using sparse routine electronic health record data outside the ICU. Our score was based on a causal probabilistic network model-SepsisFinder-which has similarities with clinical reasoning. A prediction was generated hourly on all admissions, providing a new variable was registered. Compared to the National Early Warning Score (NEWS2), which is an established method to identify sepsis, the SepsisFinder triggered earlier and had a higher area under receiver operating characteristic curve (AUROC) (0.950 vs. 0.872), as well as area under precision-recall curve (APR) (0.189 vs. 0.149). A machine learning comparator based on a gradient-boosting decision tree model had similar AUROC (0.949) and higher APR (0.239) than SepsisFinder but triggered later than both NEWS2 and SepsisFinder. The precision of SepsisFinder increased if screening was restricted to the earlier admission period and in episodes with bloodstream infection. Furthermore, the SepsisFinder signaled median 5.5 h prior to antibiotic administration. Identifying a high-risk population with this method could be used to tailor clinical interventions and improve patient care.

National Category
Infectious Medicine General Practice
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-221264 (URN)10.1038/s41598-023-38858-4 (DOI)001034058100024 ()37474597 (PubMedID)2-s2.0-85165415402 (Scopus ID)
Available from: 2023-09-26 Created: 2023-09-26 Last updated: 2023-10-04Bibliographically approved
Vakili, T. & Dalianis, H. (2023). Using Membership Inference Attacks to Evaluate Privacy-Preserving Language Modeling Fails for Pseudonymizing Data. In: 24th Nordic Conference on Computational Linguistics (NoDaLiDa): . Paper presented at Nordic Conference on Computational Linguistics (pp. 318-323).
Open this publication in new window or tab >>Using Membership Inference Attacks to Evaluate Privacy-Preserving Language Modeling Fails for Pseudonymizing Data
2023 (English)In: 24th Nordic Conference on Computational Linguistics (NoDaLiDa), 2023, p. 318-323Conference paper, Published paper (Refereed)
Abstract [en]

Large pre-trained language models dominate the current state-of-the-art for many natural language processing applications, including the field of clinical NLP. Several studies have found that these can be susceptible to privacy attacks that are unacceptable in the clinical domain where personally identifiable information (PII) must not be exposed.

However, there is no consensus regarding how to quantify the privacy risks of different models. One prominent suggestion is to quantify these risks using membership inference attacks. In this study, we show that a state-of-the-art membership inference attack on a clinical BERT model fails to detect the privacy benefits from pseudonymizing data. This suggests that such attacks may be inadequate for evaluating token-level privacy preservation of PIIs.

Series
Northern European Association for Language Technology (NEALT), ISSN 1736-8197, E-ISSN 1736-6305 ; 52
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-216681 (URN)
Conference
Nordic Conference on Computational Linguistics
Available from: 2023-04-24 Created: 2023-04-24 Last updated: 2023-10-04Bibliographically approved
Chomutare, T., Budrionis, A. & Dalianis, H. (2022). Combining deep learning and fuzzy logic to predict rare ICD-10 codes from clinical notes. In: Sheikh Iqbal Ahamed; Claudio Augistino Ardagna; Hongyi Bian; Mario Bochicchio; Carl K. Chang; Rong N. Chang; Ernesto Damiani; Lin Liu; Misha Pavel; Corrado Priami; Hossain Shahriar; Robert Ward; Fatos Xhafa; Jia Zhang; Farhana Zulkernine (Ed.), Proceedings - 2022 IEEE International Conference on Digital Health (ICDH 2022): . Paper presented at 2022 IEEE International Conference on Digital Health (ICDH 2022), Barcelona, Spain (hybrid), 11-15 July, 2022 (pp. 163-168). Piscataway: IEEE
Open this publication in new window or tab >>Combining deep learning and fuzzy logic to predict rare ICD-10 codes from clinical notes
2022 (English)In: Proceedings - 2022 IEEE International Conference on Digital Health (ICDH 2022) / [ed] Sheikh Iqbal Ahamed; Claudio Augistino Ardagna; Hongyi Bian; Mario Bochicchio; Carl K. Chang; Rong N. Chang; Ernesto Damiani; Lin Liu; Misha Pavel; Corrado Priami; Hossain Shahriar; Robert Ward; Fatos Xhafa; Jia Zhang; Farhana Zulkernine, Piscataway: IEEE, 2022, p. 163-168Conference paper, Published paper (Refereed)
Abstract [en]

Computer assisted coding (CAC) of clinical text into standardized classifications such as ICD-10 is an important challenge. For frequently used ICD-10 codes, deep learning approaches have been quite successful. For rare codes, however, the problem is still outstanding. To improve performance for rare codes, a pipeline is proposed that takes advantage of the ICD-10 code hierarchy to combine semantic capabilities of deep learning and the flexibility of fuzzy logic. The data used are discharge summaries in Swedish in the medical speciality of gastrointestinal diseases. Using our pipeline, fuzzy matching computation time is reduced and accuracy of the top 10 hits of the rare codes is also improved. While the method is promising, further work is required before the pipeline can be part of a usable prototype. Code repository: https://github.com/icd-coding/zeroshot.

Place, publisher, year, edition, pages
Piscataway: IEEE, 2022
Keywords
Deep learning, fuzzy logic, ICD-10, clinical notes, Swedish, zero-shot, EHR, natural language processing
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:su:diva-212705 (URN)10.1109/ICDH55609.2022.00033 (DOI)000861317700026 ()2-s2.0-85138053155 (Scopus ID)978-1-6654-8149-6 (ISBN)
Conference
2022 IEEE International Conference on Digital Health (ICDH 2022), Barcelona, Spain (hybrid), 11-15 July, 2022
Available from: 2022-12-13 Created: 2022-12-13 Last updated: 2022-12-13Bibliographically approved
Vakili, T., Lamproudis, A., Henriksson, A. & Dalianis, H. (2022). Downstream Task Performance of BERT Models Pre-Trained Using Automatically De-Identified Clinical Data. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022): . Paper presented at Conference on Language Resources and Evaluation (LREC 2022), Marseilles, France, 21-23 June 2022 (pp. 4245-4252). European Language Resources Association
Open this publication in new window or tab >>Downstream Task Performance of BERT Models Pre-Trained Using Automatically De-Identified Clinical Data
2022 (English)In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), European Language Resources Association , 2022, p. 4245-4252Conference paper, Published paper (Refereed)
Abstract [en]

Automatic de-identification is a cost-effective and straightforward way of removing large amounts of personally identifiable information from large and sensitive corpora. However, these systems also introduce errors into datasets due to their imperfect precision. These corruptions of the data may negatively impact the utility of the de-identified dataset. This paper de-identifies a very large clinical corpus in Swedish either by removing entire sentences containing sensitive data or by replacing sensitive words with realistic surrogates. These two datasets are used to perform domain adaptation of a general Swedish BERT model. The impact of the de-identification techniques is assessed by training and evaluating the models using six clinical downstream tasks. The results are then compared to a similar BERT model domain-adapted using an unaltered version of the clinical corpus. The results show that using an automatically de-identified corpus for domain adaptation does not negatively impact downstream performance. We argue that automatic de-identification is an efficient way of reducing the privacy risks of domain-adapted models and that the models created in this paper should be safe to distribute to other academic researchers.

Place, publisher, year, edition, pages
European Language Resources Association, 2022
Keywords
Privacy-preserving machine learning, pseudonymization, de-identification, Swedish clinical text, pre-trained language models, BERT, downstream tasks, NER, multi-label classification
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-207395 (URN)
Conference
Conference on Language Resources and Evaluation (LREC 2022), Marseilles, France, 21-23 June 2022
Available from: 2022-07-15 Created: 2022-07-15 Last updated: 2023-04-24
Lamproudis, A., Henriksson, A. & Dalianis, H. (2022). Evaluating Pretraining Strategies for Clinical BERT Models. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022): . Paper presented at Conference on Language Resources and Evaluation (LREC 2022), 21-23 June 2022, Marseille, France. (pp. 410-416). European Language Resources Association
Open this publication in new window or tab >>Evaluating Pretraining Strategies for Clinical BERT Models
2022 (English)In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), European Language Resources Association , 2022, p. 410-416Conference paper, Published paper (Refereed)
Abstract [en]

Research suggests that using generic language models in specialized domains may be sub-optimal due to significant domain differences. As a result, various strategies for developing domain-specific language models have been proposed, including techniques for adapting an existing generic language model to the target domain, e.g. through various forms of vocabulary modifications and continued domain-adaptive pretraining with in-domain data. Here, an empirical investigation is carried out in which various strategies for adapting a generic language model to the clinical domain are compared to pretraining a pure clinical language model. Three clinical language models for Swedish, pretrained for up to ten epochs, are fine-tuned and evaluated on several downstream tasks in the clinical domain. A comparison of the language models’ downstream performance over the training epochs is conducted. The results show that the domain-specific language models outperform a general-domain language model, although there is little difference in performance between the various clinical language models. However, compared to pretraining a pure clinical language model with only in-domain data, leveraging and adapting an existing general-domain language model requires fewer epochs of pretraining with in-domain data.

Place, publisher, year, edition, pages
European Language Resources Association, 2022
Keywords
language models, domain-adaptive pretraining, Swedish clinical text
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-207397 (URN)
Conference
Conference on Language Resources and Evaluation (LREC 2022), 21-23 June 2022, Marseille, France.
Available from: 2022-07-15 Created: 2022-07-15 Last updated: 2023-12-04Bibliographically approved
Dolk, A., Davidsen, H., Dalianis, H. & Vakili, T. (2022). Evaluation of LIME and SHAP in Explaining Automatic ICD-10 Classifications of Swedish Gastrointestinal Discharge Summaries. In: André Henriksen; Elia Gabarron; Vivian Vimarlund (Ed.), Proceedings of the 18th Scandinavian Conference on Health Informatics: . Paper presented at Tthe 18th Scandinavian Conference on Health Informatics - SHI 2022 in Tromsø, Norway on August 22-24, 2022 (pp. 166-173). Linköping University Electronic Press
Open this publication in new window or tab >>Evaluation of LIME and SHAP in Explaining Automatic ICD-10 Classifications of Swedish Gastrointestinal Discharge Summaries
2022 (English)In: Proceedings of the 18th Scandinavian Conference on Health Informatics / [ed] André Henriksen; Elia Gabarron; Vivian Vimarlund, Linköping University Electronic Press , 2022, p. 166-173Conference paper, Published paper (Refereed)
Abstract [en]

A computer-assisted coding tool could alleviate the burden on medical staff to assign ICD diagnosis codes to discharge summaries by utilising deep learning models to generate recommendations. However, the opaque nature of deep learning models makes it hard for humans to trust them. In this study, the explainable AI models LIME and SHAP have been applied to the clinical language model SweDeClin-BERT to explain ICD-10 codes assigned to Swedish gastrointestinal discharge summaries. The explanations have been evaluated by eight medical experts, showing a statistically higher significant difference in explainable performance for SHAP compared to LIME.

Place, publisher, year, edition, pages
Linköping University Electronic Press, 2022
Series
Linköping Electronic Conference Proceedings, ISSN 1650-3686, E-ISSN 1650-3740
Keywords
ICD-10 diagnosis code, Natural language processing, eXplainable AI, Multi-label text classification
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-209696 (URN)10.3384/ecp187028 (DOI)978-91-7929-344-4 (ISBN)
Conference
Tthe 18th Scandinavian Conference on Health Informatics - SHI 2022 in Tromsø, Norway on August 22-24, 2022
Available from: 2022-09-23 Created: 2022-09-23 Last updated: 2022-09-26Bibliographically approved
Karlsson Valik, J., Mellhammar, L., Sundén-Cullberg, J., Ward, L., Unge, C., Dalianis, H., . . . Nauclér, P. (2022). Peripheral Oxygen Saturation Facilitates Assessment of Respiratory Dysfunction in the Sequential Organ Failure Assessment Score With Implications for the Sepsis-3 Criteria. Critical Care Medicine, 50(3), e272-e283
Open this publication in new window or tab >>Peripheral Oxygen Saturation Facilitates Assessment of Respiratory Dysfunction in the Sequential Organ Failure Assessment Score With Implications for the Sepsis-3 Criteria
Show others...
2022 (English)In: Critical Care Medicine, ISSN 0090-3493, E-ISSN 1530-0293, Vol. 50, no 3, p. e272-e283Article in journal (Refereed) Published
Abstract [en]

OBJECTIVES:

Sequential Organ Failure Assessment score is the basis of the Sepsis-3 criteria and requires arterial blood gas analysis to assess respiratory function. Peripheral oxygen saturation is a noninvasive alternative but is not included in neither Sequential Organ Failure Assessment score nor Sepsis-3. We aimed to assess the association between worst peripheral oxygen saturation during onset of suspected infection and mortality.

DESIGN:

Cohort study of hospital admissions from a main cohort and emergency department visits from four external validation cohorts between year 2011 and 2018. Data were collected from electronic health records and prospectively by study investigators.

SETTING:

Eight academic and community hospitals in Sweden and Canada.

PATIENTS:

Adult patients with suspected infection episodes.

INTERVENTIONS:

None.

MEASUREMENTS AND MAIN RESULTS:

The main cohort included 19,396 episodes (median age, 67.0 [53.0–77.0]; 9,007 [46.4%] women; 1,044 [5.4%] died). The validation cohorts included 10,586 episodes (range of median age, 61.0–76.0; women 42.1–50.2%; mortality 2.3–13.3%). Peripheral oxygen saturation levels 96–95% were not significantly associated with increased mortality in the main or pooled validation cohorts. At peripheral oxygen saturation 94%, the adjusted odds ratio of death was 1.56 (95% CI, 1.10–2.23) in the main cohort and 1.36 (95% CI, 1.00–1.85) in the pooled validation cohorts and increased gradually below this level. Respiratory assessment using peripheral oxygen saturation 94–91% and less than 91% to generate 1 and 2 Sequential Organ Failure Assessment points, respectively, improved the discrimination of the Sequential Organ Failure Assessment score from area under the receiver operating characteristics 0.75 (95% CI, 0.74–0.77) to 0.78 (95% CI, 0.77–0.80; p < 0.001). Peripheral oxygen saturation/Fio2 ratio had slightly better predictive performance compared with peripheral oxygen saturation alone, but the clinical impact was minor.

CONCLUSIONS:

These findings provide evidence for assessing respiratory function with peripheral oxygen saturation in the Sequential Organ Failure Assessment score and the Sepsis-3 criteria. Our data support using peripheral oxygen saturation thresholds 94% and 90% to get 1 and 2 Sequential Organ Failure Assessment respiratory points, respectively. This has important implications primarily for emergency practice, rapid response teams, surveillance, research, and resource-limited settings.

Keywords
critical illness, infections, pulse oximetry, respiratory failure, sepsis, Sequential Organ Failure Assessment scores
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-200506 (URN)10.1097/CCM.0000000000005318 (DOI)000759057200006 ()
Available from: 2022-01-06 Created: 2022-01-06 Last updated: 2022-03-24Bibliographically approved
Van Der Werff, S. D., Fritzing, M., Tanushi, H., Henriksson, A., Dalianis, H., Ternhag, A., . . . Nauclér, P. (2022). The accuracy of fully automated algorithms for surveillance of healthcare-onset Clostridioides difficile infections in hospitalized patients. Antimicrobial Stewardship and Healthcare Epidemiology, 2(1), 1-4, Article ID e43.
Open this publication in new window or tab >>The accuracy of fully automated algorithms for surveillance of healthcare-onset Clostridioides difficile infections in hospitalized patients
Show others...
2022 (English)In: Antimicrobial Stewardship and Healthcare Epidemiology, ISSN 2732-494X, Vol. 2, no 1, p. 1-4, article id e43Article in journal (Refereed) Published
Abstract [en]

We developed and validated a set of fully automated surveillance algorithms for healthcare-onset CDI using electronic health records. In a validation data set of 750 manually annotated admissions, the algorithm based on International Classification of Disease, Tenth Revision (ICD-10) code A04.7 had insufficient sensitivity. Algorithms based on microbiological test results with or without addition of symptoms performed well.

National Category
Mathematics
Identifiers
urn:nbn:se:su:diva-208772 (URN)10.1017/ash.2022.32 (DOI)2-s2.0-85129906677 (Scopus ID)
Available from: 2022-09-06 Created: 2022-09-06 Last updated: 2022-09-06Bibliographically approved
Vakili, T. & Dalianis, H. (2022). Utility Preservation of Clinical Text After De-Identification. In: Dina Demner-Fushman; Kevin Bretonnel Cohen; Sophia Ananiadou; Junichi Tsujii (Ed.), Proceedings of the 21st Workshop on Biomedical Language Processing: . Paper presented at 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22-27 May 2022 (pp. 383-388). Association for Computational Linguistics
Open this publication in new window or tab >>Utility Preservation of Clinical Text After De-Identification
2022 (English)In: Proceedings of the 21st Workshop on Biomedical Language Processing / [ed] Dina Demner-Fushman; Kevin Bretonnel Cohen; Sophia Ananiadou; Junichi Tsujii, Association for Computational Linguistics , 2022, p. 383-388Conference paper, Published paper (Refereed)
Abstract [en]

Electronic health records contain valuable information about symptoms, diagnosis, treatment and outcomes of the treatments of individual patients. However, the records may also contain information that can reveal the identity of the patients. Removing these identifiers - the Protected Health Information (PHI) - can protect the identity of the patient. Automatic de-identification is a process which employs machine learning techniques to detect and remove PHI. However, automatic techniques are imperfect in their precision and introduce noise into the data. This study examines the impact of this noise on the utility of Swedish de-identified clinical data by using human evaluators and by training and testing BERT models. Our results indicate that de-identification does not harm the utility for clinical NLP and that human evaluators are less sensitive to noise from de-identification than expected.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2022
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-207402 (URN)10.18653/v1/2022.bionlp-1.38 (DOI)978-1-955917-27-8 (ISBN)
Conference
60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22-27 May 2022
Available from: 2022-07-15 Created: 2022-07-15 Last updated: 2023-04-24Bibliographically approved
Lamproudis, A., Henriksson, A. & Dalianis, H. (2022). Vocabulary Modifications for Domain-adaptive Pretraining of Clinical Language Models. In: Nathalie Bier; Ana Fred; Hugo Gamboa (Ed.), Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF: . Paper presented at The 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022), 9 - 11 February, 2022, Online (pp. 180-188). SciTePress
Open this publication in new window or tab >>Vocabulary Modifications for Domain-adaptive Pretraining of Clinical Language Models
2022 (English)In: Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF / [ed] Nathalie Bier; Ana Fred; Hugo Gamboa, SciTePress , 2022, p. 180-188Conference paper, Published paper (Refereed)
Abstract [en]

Research has shown that using generic language models – specifically, BERT models – in specialized domains may be sub-optimal due to domain differences in language use and vocabulary. There are several techniques for developing domain-specific language models that leverage the use of existing generic language models, including continued and domain-adaptive pretraining with in-domain data. Here, we investigate a strategy based on using a domain-specific vocabulary, while leveraging a generic language model for initialization. The results demonstrate that domain-adaptive pretraining, in combination with a domain-specific vocabulary – as opposed to a general-domain vocabulary – yields improvements on two downstream clinical NLP tasks for Swedish. The results highlight the value of domain-adaptive pretraining when developing specialized language models and indicate that it is beneficial to adapt the vocabulary of the language model to the target domain prior to continued, domain-adaptive pretraining of a generic language model.

Place, publisher, year, edition, pages
SciTePress, 2022
Series
Biostec, ISSN 2184-349X, E-ISSN 2184-4305
Keywords
Natural Language Processing, Language Models, Domain-adaptive Pretraining, Clinical Text, Swedish
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-207403 (URN)10.5220/0010893800003123 (DOI)978-989-758-552-4 (ISBN)
Conference
The 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022), 9 - 11 February, 2022, Online
Available from: 2022-07-15 Created: 2022-07-15 Last updated: 2022-08-23Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-0165-9926

Search in DiVA

Show all publications