Machine Learning Tools to Identify Risk Drivers in Water
2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]
Due to the increasing number of chemicals used in our daily lives, more and more chemicals end up in the environment. Many such contaminants accumulate in water, with thousands of chemicals detected in environmental water samples using liquid chromatography – high-resolution mass spectrometry (LC/HRMS). As a result, all water-dependent organisms are exposed to a large number of low-concentration chemicals, while the health effects of such exposures are unknown. Unfortunately, only a small fraction of the detected chemicals is identified and can be further investigated for their effects on organisms.
This thesis investigated the opportunity to use experimental data of such detected but unidentified chemicals for predicting information regarding their environmental concentration levels, toxicity, and risk - the combination of both. Firstly, in paper I, the trends in risk estimation for chemicals detected in water samples were investigated across the years 2019 to 2022. The analysis indicated that risk was considered in only 13% of the papers. In paper II, a concentration prediction model, MS2Quant, was developed, allowing concentration prediction for unidentified chemicals based on tandem mass spectra. The experimental data-based concentration predictions were comparable with structure-based predictions. Further, in paper III, the predictions from the MS2Quant model were combined with in-house developed MS2Tox model for adult fish acute toxicity predictions in order to prioritize features in wastewater samples. While the feature set of the effluent samples was reduced by 73% to 99%, the subsequent structural assignment with library matching and in silico tools could not assign a probable structure for the majority of the prioritized features, highlighting the advantages of incorporating experimental data-based methods in the analysis. Finally, paper IV focused on the experimental validation of mixture toxicity predictions. For this, a complementary fish embryo acute toxicity model was developed, and the toxicity values were experimentally validated for eight chemicals. Combined with concentration predictions, the cumulative mixture toxicity was predicted with a 3× geometric mean error.
The tools developed, investigated, and validated in this thesis showcase the possibility of using available experimental data together with machine learning approaches for exposure and toxicity predictions of unidentified features. They allow looking into a larger subset of detected chemicals for subsequent tandem mass spectra-based prioritization of features that are more likely to cause harm and need immediate attention.
Place, publisher, year, edition, pages
Stockholm: Department of Chemistry, Stockholm University , 2026. , p. 52
Keywords [en]
non-targeted screening, mass spectrometry, liquid chromatography, machine learning, exposure, toxicity, risk, prioritization
National Category
Analytical Chemistry
Research subject
Analytical Chemistry
Identifiers
URN: urn:nbn:se:su:diva-253770ISBN: 978-91-8107-574-8 (print)ISBN: 978-91-8107-575-5 (electronic)OAI: oai:DiVA.org:su-253770DiVA, id: diva2:2049408
Public defence
2026-05-15, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16B, Stockholm, 09:00 (English)
Opponent
Supervisors
2026-04-222026-03-302026-04-14Bibliographically approved
List of papers