CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data-Driven AI for Patient and Public Health: On the Use of Multisource and Multimodal Data in Machine Learning to Improve Healthcare
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The integration of artificial intelligence in healthcare has created a new era of advancements, reshaping patient care and revolutionizing public health interventions. Through artificial intelligence, healthcare providers and public health authorities can optimize interventions, leading to more precise and efficient responses that enhance patient outcomes and address public health challenges effectively. The past decade has witnessed a rapid digital transformation across industries, and healthcare is no exception. This evolution is evident in the widespread adoption of electronic health records and healthcare information systems and the integration of diverse technologies, including handheld, wearable, and smart devices.

A central challenge in this digital shift lies in representing data from multiple sources and modalities for downstream machine learning tasks. This complexity stems from the varied longitudinal or contextual events in patients' historical records, encompassing lab tests, vital signs, diagnoses, and drug administration. Additionally, the challenge extends to predictive modeling and constructing robust models that accurately classify future health events, taking into consideration heterogeneous health-related data. Electronic phenotyping, crucial for identifying fine-grained disease/patient clusters, is also a central problem when utilizing multisource and multimodal information effectively to create meaningful patient profiles. In the context of public health interventions, exemplified by crises like the COVID-19 pandemic, decision-making requires a delicate balance between optimizing intervention effectiveness and considering economic and societal well-being.

This Ph.D. thesis seeks to unravel the potential of multisource and multimodal health observational data in generating patient phenotypes and predictions for both individual health and public health surveillance. It addresses the following central question: How can multisource and multimodal observational health data be effectively harnessed, using machine learning, to enhance patient and public health? Comprising five studies, the thesis confronts challenges posed by diverse data sources and modalities, exploring strategies for creating comprehensive patient profiles, developing robust classification models, and employing clustering methods tailored to observational health data. The research seeks to provide valuable insights into integrating AI in healthcare, with a specific emphasis on the complexities of multisource and multimodal data integration. It underscores the importance of exploring heterogeneous health observational data to deepen our understanding of patient health and optimize machine learning applications. Emphasizing the intricate nature of health data, the thesis discusses careful data handling and innovative methodologies to maximize its potential impact on improving patient outcomes and informing public health strategies. The effective management of heterogeneous observational health data requires thoughtful consideration due to their varied sources and inherent complexities.

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University , 2024. , p. 86
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 24-009
Keywords [en]
Machine Learning; Artificial Intelligence; Healthcare; Multimodal Data; Complex Data
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-231336ISBN: 978-91-8014-847-4 (print)ISBN: 978-91-8014-848-1 (electronic)OAI: oai:DiVA.org:su-231336DiVA, id: diva2:1873123
Public defence
2024-09-06, L30, NOD-huset, Borgarfjordsgatan 12, Kista, 09:00 (English)
Opponent
Supervisors
Available from: 2024-08-14 Created: 2024-06-18 Last updated: 2024-08-19Bibliographically approved
List of papers
1. Mining Adverse Drug Events Using Multiple Feature Hierarchies and Patient History Windows
Open this publication in new window or tab >>Mining Adverse Drug Events Using Multiple Feature Hierarchies and Patient History Windows
2019 (English)In: 19th IEEE International Conference on Data Mining Workshops: Proceedings / [ed] Panagiotis Papapetrou, Xueqi Cheng, Qing He, IEEE, 2019Conference paper, Published paper (Refereed)
Abstract [en]

We study the problem of detecting adverse drug events in electronic health records. The challenge is this work is to aggregate heterogeneous data types involving lab measurements, diagnoses codes and medications codes. An earlier framework proposed for the same problem demonstrated promising predictive performance for the random forest classifier by using only lab measurements as data features. We extend this framework, by additionally including diagnosis and drug prescription codes, concurrently. In addition, we employ the concept of hierarchies of clinical codes as proposed by another work, in order to exploit the inherently complex nature of the medical data. Moreover, we extended the state-of-the-art by considering variable patient history lengths before the occurrence of an ADE event rather than a patient history of an arbitrary length. Our experimental evaluation on eight medical datasets of adverse drug events, five different patient history lengths, and six different classifiers, suggests that the integration of these additional features on the different window lengths provides significant improvements in terms of AUC while employing medically relevant features.

Place, publisher, year, edition, pages
IEEE, 2019
Series
IEEE International Conference on Data Mining workshops, ISSN 2375-9232, E-ISSN 2375-9259
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-178340 (URN)10.1109/ICDMW.2019.00135 (DOI)978-1-7281-4897-7 (ISBN)978-1-7281-4896-0 (ISBN)
Conference
19th IEEE International Conference on Data Mining Workshops (ICDMW), Beijing, China, 8–11 November, 2019
Available from: 2020-01-24 Created: 2020-01-24 Last updated: 2024-06-18Bibliographically approved
2. A clustering framework for patient phenotyping with application to adverse drug events
Open this publication in new window or tab >>A clustering framework for patient phenotyping with application to adverse drug events
2020 (English)In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), IEEE, 2020, p. 177-182Conference paper, Published paper (Refereed)
Abstract [en]

We present a clustering framework for identifying patient groups with Adverse Drug Reactions from Electronic Health Records (EHRs). The increased adoption of EHRs has brought changes in the way drug safety surveillance is carried out and plays an important role in effective drug regulation. Unsupervised machine learning methods using EHRs as their input can identify patients that share common meaningful information, without the need for expert input. In this work, we propose a generalized framework that exploits the strengths of different clustering algorithms and via clustering aggregation identifies consensus patient cluster profiles. Moreover, the inherent hierarchical structure of diagnoses and medication codes is exploited. We assess the statistical significance of the produced clusterings by applying a randomization technique that keeps the data distribution margins fixed, as we are interested in evaluating information that is not conveyed by the marginal distributions. The experimental findings suggest that the framework produces medically meaningful patient groups with regard to adverse drug events by investigating two use-cases, i.e., aplastic anaemia and drug-induced skin eruption.

Place, publisher, year, edition, pages
IEEE, 2020
Series
IEEE International Symposium on Computer-Based Medical Systems, ISSN 2372-918X, E-ISSN 2372-9198
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-186974 (URN)10.1109/CBMS49503.2020.00041 (DOI)978-1-7281-9429-5 (ISBN)978-1-7281-9430-1 (ISBN)
Conference
Computer-Based Medical Systems, Rochester, USA, 28-30 July, 2020
Available from: 2020-11-30 Created: 2020-11-30 Last updated: 2024-06-18Bibliographically approved
3. EpidRLearn: Learning Intervention Strategies for Epidemics with Reinforcement Learning
Open this publication in new window or tab >>EpidRLearn: Learning Intervention Strategies for Epidemics with Reinforcement Learning
2022 (English)In: Artificial Intelligence in Medicine: 20th International Conference on Artificial Intelligence in Medicine, AIME 2022, Halifax, NS, Canada, June 14–17, 2022, Proceedings / [ed] Martin Michalowski; Syed Sibte Raza Abidi; Samina Abidi, Springer Nature , 2022, p. 189-199Conference paper, Published paper (Refereed)
Abstract [en]

Epidemics of infectious diseases can pose a serious threat to public health and the global economy. Despite scientific advances, containment and mitigation of infectious diseases remain a challenging task. In this paper, we investigate the potential of reinforcement learning as a decision making tool for epidemic control by constructing a deep Reinforcement Learning simulator, called EpidRLearn, composed of a contact-based, age-structured extension of the SEIR compartmental model, referred to as C-SEIR. We evaluate EpidRLearn by comparing the learned policies to two deterministic policy baselines. We further assess our reward function by integrating an alternative reward into our deep RL model. The experimental evaluation indicates that deep reinforcement learning has the potential of learning useful policies under complex epidemiological models and large state spaces for the mitigation of infectious diseases, with a focus on COVID-19.

Place, publisher, year, edition, pages
Springer Nature, 2022
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349
Keywords
Reinforcement learning, Mitigation policies, COVID-19
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-209701 (URN)10.1007/978-3-031-09342-5_18 (DOI)2-s2.0-85135037656 (Scopus ID)978-3-031-09342-5 (ISBN)
Conference
20th International Conference on Artificial Intelligence in Medicine, AIME 2022, Halifax, Canada, June 14–17, 2022
Available from: 2022-09-23 Created: 2022-09-23 Last updated: 2024-06-18Bibliographically approved
4. Machine learning models for automated interpretation of 12-lead electrocardiographic signals: a narrative review of techniques, challenges, achievements and clinical relevance
Open this publication in new window or tab >>Machine learning models for automated interpretation of 12-lead electrocardiographic signals: a narrative review of techniques, challenges, achievements and clinical relevance
2023 (English)In: Journal of medical artificial intelligence, E-ISSN 2617-2496, Vol. 6, article id 6Article in journal (Refereed) Published
Abstract [en]

Background and Objective: Novel advances in machine learning (ML) and its subfield, deep learning (DL), as well as the recent release of large-scale electrocardiogram (ECG) databases, have driven a sharp increase in research related to automated ECG interpretation. This review aims to summarize the recent ML approaches for automatically interpreting standard 12-lead ECG signals.

Methods: We searched 10 indexing databases, for original research in English, referring to the application of ML/DL techniques in 12-lead, raw ECG signal analysis. The retrieved titles were filtered based on their relevance. The results were summarized and reported.

Key Content and Findings: More than 80% of studies integrated a DL approach, while fewer attempts applied a feature extraction method to obtain inputs for training a simple ML classifier. The average diagnostic accuracy was as high as 90%, while several other performance metrics, such as the area under the curve (AUC), F1-score, sensitivity and specificity, were also employed. DL models generally demanded 10-time more samples for training but were capable of better handling multi-class problems. The most frequently involved disease (49% of studies) was myocardial infarction (MI), while atrial fibrillation (AF) was encountered in more than one-third of studies. Various datasets were used for training and testing, constituting either private collections or publicly available databanks [such as the “Physikalisch-Technische Bundesanstalt” (PTB) dataset and datasets derived from the “China Physiological Signal Challenge” and the “Computing in Cardiology Challenge”]. Overall, DL and simpler ML approaches for automated ECG interpretation display unprecedented growth, reaching remarkably high performances.

Conclusions: While such novel tools can support clinicians in reaching reliable diagnoses for life-threatening conditions on the spot, concerns regarding their accountability do exist. Generalizability of the developed approaches is still an issue, possibly mitigable with the extensive deployment of developed models, so as to become massively accessible and validatable. Finally, the observed heterogeneity of the various attempts underlines the need for transparency and reproducibility in the development processes.

Keywords
ECG, electrocardiogram, machine learning (ML), deep learning (DL)
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-224559 (URN)10.21037/jmai-22-94 (DOI)2-s2.0-85166205680 (Scopus ID)
Available from: 2023-12-18 Created: 2023-12-18 Last updated: 2024-06-18Bibliographically approved
5. M-ClustEHR: A multimodal clustering approach for electronic health records
Open this publication in new window or tab >>M-ClustEHR: A multimodal clustering approach for electronic health records
2024 (English)In: Artificial Intelligence in Medicine, ISSN 0933-3657, E-ISSN 1873-2860, Vol. 154, article id 102905Article in journal (Refereed) Published
Abstract [en]

Sepsis refers to a potentially life-threatening situation where the immune system of the human body has an extreme response to an infection. In the presence of underlying comorbidities, the situation can become even worse and result in death. Employing unsupervised machine learning techniques, such as clustering, can assist in providing a better understanding of patient phenotypes by unveiling subgroups characterized by distinct sepsis progression and treatment patterns. More concretely, this study introduces M-ClustEHR, a clustering approach that utilizes medical data of multiple modalities by employing a multimodal autoencoder for learning comprehensive sepsis patient representations. M-ClustEHR consistently outperforms traditional clustering approaches in terms of several internal clustering performance metrics, as well as cluster stability in identifying phenotypes in the sepsis cohort. The unveiled patterns, supported by existing medical literature and clinicians, highlight the importance of multimodal clustering for advancing personalized sepsis care.

Keywords
Clustering, Deep learning, Electronic health records
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-231314 (URN)10.1016/j.artmed.2024.102905 (DOI)38908256 (PubMedID)2-s2.0-85196386507 (Scopus ID)
Available from: 2024-06-18 Created: 2024-06-18 Last updated: 2024-07-01Bibliographically approved

Open Access in DiVA

Data-Driven AI for Patient and Public Health(2767 kB)88 downloads
File information
File name FULLTEXT03.pdfFile size 2767 kBChecksum SHA-512
8dd44bedf2cdd1d49b2176797d40ed6d06c7c77c3f13aa78e44cc8621a24fcf3c49e609394b75122aa135e9f7ffe69fda0a99fa9ef0717c7862e1348322168b5
Type fulltextMimetype application/pdf

Authority records

Bampa, Maria

Search in DiVA

By author/editor
Bampa, Maria
By organisation
Department of Computer and Systems Sciences
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 88 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 609 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf