Change search
Link to record
Permanent link

Direct link
Publications (10 of 35) Show all publications
Wang, W.-C., Amini, N., Huber, C., Kull, M. & Kruve, A. (2025). Active Learning Improves Ionization Efficiency Predictions and Quantification in Nontargeted LC/HRMS. Analytical Chemistry, 97(25), 13131-13139
Open this publication in new window or tab >>Active Learning Improves Ionization Efficiency Predictions and Quantification in Nontargeted LC/HRMS
Show others...
2025 (English)In: Analytical Chemistry, ISSN 0003-2700, E-ISSN 1520-6882, Vol. 97, no 25, p. 13131-13139Article in journal (Refereed) Published
Abstract [en]

Liquid chromatography electrospray ionization high-resolution mass spectrometry (LC/ESI/HRMS) is frequently employed in nontargeted screening (NTS) due to its high selectivity and sensitivity. However, data interpretation is challenging since the number of chemical standards available for quantification is limited and the response of the chemicals vastly differs depending on their structure and analysis conditions. Therefore, machine learning (ML) models have been utilized to predict ionization efficiency (IE) and enable the quantification of detected chemicals. It has been observed that the error in the predictions is high for chemicals structurally different from the training data. To enlarge the training set and to accurately predict the IE given a limited labeling budget, active learning (AL) is proposed to acquire informative data points from the targeted chemical space. In the current study, four AL approaches (clustering-based, uncertainty-based, mix, and anticlustering) and a baseline approach (random) were evaluated for IE prediction. The RMSE of the IE in the targeted space dropped significantly (up to 0.3 log units) after a single AL iteration, highlighting the necessity of chemical space exploration before ML model execution. Clustering-based AL reduced the RMSE least, while the uncertainty-based AL was inefficient if ten or more chemicals were sampled in one iteration, thereby reducing its practicality. Finally, expanding the chemical space improved the quantification accuracy from a fold error of 4.13× to 2.94× for five natural products in Alpinia officinarum, thereby demonstrating the need for updating the chemical space coverage of the training set.

National Category
Analytical Chemistry
Identifiers
urn:nbn:se:su:diva-245751 (URN)10.1021/acs.analchem.5c00816 (DOI)001510196500001 ()2-s2.0-105008396978 (Scopus ID)
Available from: 2025-08-25 Created: 2025-08-25 Last updated: 2025-08-25Bibliographically approved
Hupatz, H., Rahu, I., Wang, W.-C., Peets, P., Palm, E. H. & Kruve, A. (2025). Critical review on in silico methods for structural annotation of chemicals detected with LC/HRMS non-targeted screening. Analytical and Bioanalytical Chemistry, 417(3), 473-493
Open this publication in new window or tab >>Critical review on in silico methods for structural annotation of chemicals detected with LC/HRMS non-targeted screening
Show others...
2025 (English)In: Analytical and Bioanalytical Chemistry, ISSN 1618-2642, E-ISSN 1618-2650, Vol. 417, no 3, p. 473-493Article, review/survey (Refereed) Published
Abstract [en]

Non-targeted screening with liquid chromatography coupled to high-resolution mass spectrometry (LC/HRMS) is increasingly leveraging in silico methods, including machine learning, to obtain candidate structures for structural annotation of LC/HRMS features and their further prioritization. Candidate structures are commonly retrieved based on the tandem mass spectral information either from spectral or structural databases; however, the vast majority of the detected LC/HRMS features remain unannotated, constituting what we refer to as a part of the unknown chemical space. Recently, the exploration of this chemical space has become accessible through generative models. Furthermore, the evaluation of the candidate structures benefits from the complementary empirical analytical information such as retention time, collision cross section values, and ionization type. In this critical review, we provide an overview of the current approaches for retrieving and prioritizing candidate structures. These approaches come with their own set of advantages and limitations, as we showcase in the example of structural annotation of ten known and ten unknown LC/HRMS features. We emphasize that these limitations stem from both experimental and computational considerations. Finally, we highlight three key considerations for the future development of in silico methods.

Keywords
Generative modeling, Machine learning, Non-targeted analysis, Non-targeted screening, Suspect screening, Untargeted screening
National Category
Analytical Chemistry
Identifiers
urn:nbn:se:su:diva-239112 (URN)10.1007/s00216-024-05471-x (DOI)001290127000002 ()39138659 (PubMedID)2-s2.0-85203470144 (Scopus ID)
Available from: 2025-02-06 Created: 2025-02-06 Last updated: 2025-10-01Bibliographically approved
Malm, L. & Kruve, A. (2025). Do experimental projection methods outcompete retention time prediction models in non-target screening? A case study on LC/HRMS interlaboratory comparison data. The Analyst, 150(16), 3567-3577
Open this publication in new window or tab >>Do experimental projection methods outcompete retention time prediction models in non-target screening? A case study on LC/HRMS interlaboratory comparison data
2025 (English)In: The Analyst, ISSN 0003-2654, E-ISSN 1364-5528, Vol. 150, no 16, p. 3567-3577Article in journal (Refereed) Published
Abstract [en]

Retention time (RT) is essential in evaluating the likelihood of candidate structures in nontarget screening (NTS) with liquid chromatography high resolution mass spectrometry (LC/HRMS). Approaches for estimating the RTs of candidate structures can broadly be divided into projection and prediction methods. The first approach takes advantage of public databases of RTs measured on similar chromatographic systems (CSsource) and projects these to the chromatographic system applied in the NTS (CSNTS) based on a small set of commonly analyzed chemicals. The second approach leverages machine learning (ML) model(s) trained on publicly available retention time data measured on one or more chromatographic systems (CStraining). Nevertheless, the CSsource and CStraining might differ substantially from CSNTS. Therefore, it is of interest to evaluate the generalizability of projection models and prediction models in CSs routinely applied in NTS. Here we take advantage of the recent NORMAN interlaboratory comparison where 41 known calibration chemicals and 45 suspects were analyzed to evaluate both the projection and prediction approaches on 37 CSs. The accuracy of both approaches was directly linked to the similarity of the CS, and the pH of the mobile phase and the column chemistry were found to be most impactful. Furthermore, for cases where CSsource and CSNTS differ substantially but CStraining and CSNTS are similar, prediction models often performed on par with the projection models. These findings highlight the need to account for the mobile phase and column chemistry in ML model training and select the prediction model for RT.

National Category
Analytical Chemistry
Identifiers
urn:nbn:se:su:diva-246095 (URN)10.1039/d5an00323g (DOI)001530106300001 ()40671565 (PubMedID)2-s2.0-105010925188 (Scopus ID)
Available from: 2025-08-28 Created: 2025-08-28 Last updated: 2025-08-28Bibliographically approved
Meekel, N., Kruve, A., Lamoree, M. H. & Been, F. M. (2025). Machine Learning-based Classification for the Prioritization of Potentially Hazardous Chemicals with Structural Alerts in Nontarget Screening. Environmental Science and Technology, 59(10), 5056-5065
Open this publication in new window or tab >>Machine Learning-based Classification for the Prioritization of Potentially Hazardous Chemicals with Structural Alerts in Nontarget Screening
2025 (English)In: Environmental Science and Technology, ISSN 0013-936X, E-ISSN 1520-5851, Vol. 59, no 10, p. 5056-5065Article in journal (Refereed) Published
Abstract [en]

Nontarget screening (NTS) with liquid chromatography high-resolution mass spectrometry (LC-HRMS) is commonly used to detect unknown organic micropollutants in the environment. One of the main challenges in NTS is the prioritization of relevant LC-HRMS features. A novel prioritization strategy based on structural alerts to select NTS features that correspond to potentially hazardous chemicals is presented here. This strategy leverages raw tandem mass spectra (MS2) and machine learning models to predict the probability that NTS features correspond to chemicals with structural alerts. The models were trained on fragments and neutral losses from the experimental MS2 data. The feasibility of this approach is evaluated for two groups: aromatic amines and organophosphorus structural alerts. The neural network classification model for organophosphorus structural alerts achieved an Area Under the Curve of the Receiver Operating Characteristics (AUC-ROC) of 0.97 and a true positive rate of 0.65 on the test set. The random forest model for the classification of aromatic amines achieved an AUC-ROC value of 0.82 and a true positive rate of 0.58 on the test set. The models were successfully applied to prioritize LC-HRMS features in surface water samples, showcasing the high potential to develop and implement this approach further.

Keywords
machine learning, mass spectrometry, nontarget screening, prioritization, structural alerts, toxicity
National Category
Analytical Chemistry
Identifiers
urn:nbn:se:su:diva-242584 (URN)10.1021/acs.est.4c10498 (DOI)001440393700001 ()40051380 (PubMedID)2-s2.0-86000637771 (Scopus ID)
Available from: 2025-04-28 Created: 2025-04-28 Last updated: 2025-04-28Bibliographically approved
Lauria, M. Z., Sepman, H., Ledbetter, T., Plassmann, M., Roos, A. M., Simon, M., . . . Kruve, A. (2024). Closing the Organofluorine Mass Balance in Marine Mammals Using Suspect Screening and Machine Learning-Based Quantification. Environmental Science and Technology, 58(5), 2458-2467
Open this publication in new window or tab >>Closing the Organofluorine Mass Balance in Marine Mammals Using Suspect Screening and Machine Learning-Based Quantification
Show others...
2024 (English)In: Environmental Science and Technology, ISSN 0013-936X, E-ISSN 1520-5851, Vol. 58, no 5, p. 2458-2467Article in journal (Refereed) Published
Abstract [en]

High-resolution mass spectrometry (HRMS)-based suspect and nontarget screening has identified a growing number of novel per- and polyfluoroalkyl substances (PFASs) in the environment. However, without analytical standards, the fraction of overall PFAS exposure accounted for by these suspects remains ambiguous. Fortunately, recent developments in ionization efficiency (IE) prediction using machine learning offer the possibility to quantify suspects lacking analytical standards. In the present work, a gradient boosted tree-based model for predicting log IE in negative mode was trained and then validated using 33 PFAS standards. The root-mean-square errors were 0.79 (for the entire test set) and 0.29 (for the 7 PFASs in the test set) log IE units. Thereafter, the model was applied to samples of liver from pilot whales (n = 5; East Greenland) and white beaked dolphins (n = 5, West Greenland; n = 3, Sweden) which contained a significant fraction (up to 70%) of unidentified organofluorine and 35 unquantified suspect PFASs (confidence level 2–4). IE-based quantification reduced the fraction of unidentified extractable organofluorine to 0–27%, demonstrating the utility of the method for closing the fluorine mass balance in the absence of analytical standards.

Keywords
Combustion ion chromatography, high resolution mass spectrometry, suspect screening, ionization efficiency-based quantification, dolphins, cetaceans
National Category
Analytical Chemistry Environmental Sciences
Identifiers
urn:nbn:se:su:diva-226906 (URN)10.1021/acs.est.3c07220 (DOI)001158562000001 ()38270113 (PubMedID)2-s2.0-85184304201 (Scopus ID)
Available from: 2024-03-04 Created: 2024-03-04 Last updated: 2025-03-23Bibliographically approved
Souihi, A. & Kruve, A. (2024). Estimating LoD-s Based on the Ionization Efficiency Values for the Reporting and Harmonization of Amenable Chemical Space in Nontargeted Screening LC/ESI/HRMS. Analytical Chemistry, 96(28), 11263-11272
Open this publication in new window or tab >>Estimating LoD-s Based on the Ionization Efficiency Values for the Reporting and Harmonization of Amenable Chemical Space in Nontargeted Screening LC/ESI/HRMS
2024 (English)In: Analytical Chemistry, ISSN 0003-2700, E-ISSN 1520-6882, Vol. 96, no 28, p. 11263-11272Article in journal (Refereed) Published
Abstract [en]

Nontargeted LC/ESI/HRMS aims to detect and identify organic compounds present in the environment without prior knowledge; however, in practice no LC/ESI/HRMS method is capable of detecting all chemicals, and the scope depends on the instrumental conditions. Different experimental conditions, instruments, and methods used for sample preparation and nontargeted LC/ESI/HRMS as well as different workflows for data processing may lead to challenges in communicating the results and sharing data between laboratories as well as reduced reproducibility. One of the reasons is that only a fraction of method performance characteristics can be determined for a nontargeted analysis method due to the lack of prior information and analytical standards of the chemicals present in the sample. The limit of detection (LoD) is one of the most important performance characteristics in target analysis and directly describes the detectability of a chemical. Recently, the identification and quantification in nontargeted LC/ESI/HRMS (e.g., via predicting ionization efficiency, risk scores, and retention times) have significantly improved due to employing machine learning. In this work, we hypothesize that the predicted ionization efficiency could be used to estimate LoD and thereby enable evaluating the suitability of the LC/ESI/HRMS nontargeted method for the detection of suspected chemicals even if analytical standards are lacking. For this, 221 representative compounds were selected from the NORMAN SusDat list (S0), and LoD values were determined by using 4 complementary approaches. The LoD values were correlated to ionization efficiency values predicted with previously trained random forest regression. A robust regression was then used to estimate LoD values of unknown features detected in the nontargeted screening of wastewater samples. These estimated LoD values were used for prioritization of the unknown features. Furthermore, we present LoD values for the NORMAN SusDat list with a reversed-phase C18 LC method.

National Category
Analytical Chemistry
Research subject
Analytical Chemistry
Identifiers
urn:nbn:se:su:diva-232640 (URN)10.1021/acs.analchem.4c01002 (DOI)001264266500001 ()2-s2.0-85197651002 (Scopus ID)
Available from: 2024-08-20 Created: 2024-08-20 Last updated: 2024-08-22Bibliographically approved
Peets, P., Rian, M. B., Martin, J. W. & Kruve, A. (2024). Evaluation of Nontargeted Mass Spectral Data Acquisition Strategies for Water Analysis and Toxicity-Based Feature Prioritization by MS2Tox. Environmental Science and Technology, 58(39), 17406-17418
Open this publication in new window or tab >>Evaluation of Nontargeted Mass Spectral Data Acquisition Strategies for Water Analysis and Toxicity-Based Feature Prioritization by MS2Tox
2024 (English)In: Environmental Science and Technology, ISSN 0013-936X, E-ISSN 1520-5851, Vol. 58, no 39, p. 17406-17418Article in journal (Refereed) Published
Abstract [en]

The machine-learning tool MS2Tox can prioritize hazardous nontargeted molecular features in environmental waters, by predicting acute fish lethality of unknown molecules based on their MS2 spectra, prior to structural annotation. It has yet to be investigated how the extent of molecular coverage, MS2 spectra quality, and toxicity prediction confidence depend on sample complexity and MS2 data acquisition strategies. We compared two common nontargeted MS2 acquisition strategies with liquid chromatography high-resolution mass spectrometry for structural annotation accuracy by SIRIUS+CSI:FingerID and MS2Tox toxicity prediction of 191 reference chemicals spiked to LC-MS water, groundwater, surface water, and wastewater. Data-dependent acquisition (DDA) resulted in higher rates (19-62%) of correct structural annotations among reference chemicals in all matrices except wastewaters, compared to data-independent acquisition (DIA, 19-50%). However, DIA resulted in higher MS2 detection rates (59-84% DIA, 37-82% DDA), leading to higher true positive rates for spectral library matching, 40-73% compared to 34-72%. DDA resulted in higher MS2Tox toxicity prediction accuracy than DIA, with root-mean-square errors of 0.62 and 0.71 log-mM, respectively. Given the importance of MS2 spectral quality, we introduce a “CombinedConfidence” score to convey relative confidence in MS2Tox predictions and apply this approach to prioritize potentially ecotoxic nontargeted features in environmental waters.

Keywords
high-resolution mass spectrometry, LC-HRMS, LC50, machine-learning, MS/MS data acquisition methods, nontargeted analysis, nontargeted screening, toxicity prediction
National Category
Analytical Chemistry
Identifiers
urn:nbn:se:su:diva-237653 (URN)10.1021/acs.est.4c02833 (DOI)001317074300001 ()39297340 (PubMedID)2-s2.0-85204534500 (Scopus ID)
Available from: 2025-01-13 Created: 2025-01-13 Last updated: 2025-01-13Bibliographically approved
Palm, E., Engelhardt, J., Tshepelevitsh, S., Weiss, J. M. & Kruve, A. (2024). Gas Phase Reactivity of Isomeric Hydroxylated Polychlorinated Biphenyls. Journal of the American Society for Mass Spectrometry, 35(5), 1021-1029
Open this publication in new window or tab >>Gas Phase Reactivity of Isomeric Hydroxylated Polychlorinated Biphenyls
Show others...
2024 (English)In: Journal of the American Society for Mass Spectrometry, ISSN 1044-0305, E-ISSN 1879-1123, Vol. 35, no 5, p. 1021-1029Article in journal (Refereed) Published
Abstract [en]

Identification of stereo- and positional isomers detected with high-resolution mass spectrometry (HRMS) is often challenging due to near-identical fragmentation spectra (MS2), similar retention times, and collision cross-section values (CCS). Here we address this challenge on the example of hydroxylated polychlorinated biphenyls (OH-PCBs) with the aim to (1) distinguish between isomers of OH-PCBs using two-dimensional ion mobility spectrometry (2D-IMS) and (2) investigate the structure of the fragments of OH-PCBs and their fragmentation mechanisms by ion mobility spectrometry coupled to high-resolution mass spectrometry (IMS-HRMS). The MS2 spectra as well as CCS values of the deprotonated molecule and fragment ions were measured for 18 OH-PCBs using flow injections coupled to a cyclic IMS-HRMS. The MS2 spectra as well as the CCS values of the parent and fragment ions were similar between parent compound isomers; however, ion mobility separation of the fragment ions is hinting at the formation of isomeric fragments. Different parent compound isomers also yielded different numbers of isomeric fragment mobilogram peaks giving new insights into the fragmentation of these compounds and indicating new possibilities for identification. For spectral interpretation, Gibbs free energies and CCS values for the fragment ions of 4 '-OH-CB35, 4 '-OH-CB79, 2-OH-CB77 and 4-OH-CB107 were calculated and enabled assignment of structures to the isomeric mobilogram peaks of [M-H-HCl](-) fragments. Finally, further fragmentation of the isomeric fragments revealed different fragmentation pathways depending on the isomeric fragment ions.

National Category
Subatomic Physics
Identifiers
urn:nbn:se:su:diva-231272 (URN)10.1021/jasms.4c00035 (DOI)001240941700001 ()38640444 (PubMedID)2-s2.0-85191150193 (Scopus ID)
Available from: 2024-06-19 Created: 2024-06-19 Last updated: 2024-09-05Bibliographically approved
Szabo, D., Falconer, T. M., Fisher, C. M., Heise, T., Phillips, A. L., Vas, G., . . . Kruve, A. (2024). Online and Offline Prioritization of Chemicals of Interest in Suspect Screening and Non-targeted Screening with High-Resolution Mass Spectrometry. Analytical Chemistry, 96(9), 3707-3716
Open this publication in new window or tab >>Online and Offline Prioritization of Chemicals of Interest in Suspect Screening and Non-targeted Screening with High-Resolution Mass Spectrometry
Show others...
2024 (English)In: Analytical Chemistry, ISSN 0003-2700, E-ISSN 1520-6882, Vol. 96, no 9, p. 3707-3716Article, review/survey (Refereed) Published
Abstract [en]

Recent advances in high-resolution mass spectrometry (HRMS) have enabled the detection of thousands of chemicals from a single sample, while computational methods have improved the identification and quantification of these chemicals in the absence of reference standards typically required in targeted analysis. However, to determine the presence of chemicals of interest that may pose an overall impact on ecological and human health, prioritization strategies must be used to effectively and efficiently highlight chemicals for further investigation. Prioritization can be based on a chemical's physicochemical properties, structure, exposure, and toxicity, in addition to its regulatory status. This Perspective aims to provide a framework for the strategies used for chemical prioritization that can be implemented to facilitate high-quality research and communication of results. These strategies are categorized as either online or offline prioritization techniques. Online prioritization techniques trigger the isolation and fragmentation of ions from the low-energy mass spectra in real time, with user-defined parameters. Offline prioritization techniques, in contrast, highlight chemicals of interest after the data has been acquired; detected features can be filtered and ranked based on the relative abundance or the predicted structure, toxicity, and concentration imputed from the tandem mass spectrum (MS2). Here we provide an overview of these prioritization techniques and how they have been successfully implemented and reported in the literature to find chemicals of elevated risk to human and ecological environments. A complete list of software and tools is available from https://nontargetedanalysis.org/.

National Category
Environmental Sciences Analytical Chemistry
Identifiers
urn:nbn:se:su:diva-227807 (URN)10.1021/acs.analchem.3c05705 (DOI)001173752100001 ()38380899 (PubMedID)2-s2.0-85186193222 (Scopus ID)
Available from: 2024-04-05 Created: 2024-04-05 Last updated: 2024-04-29Bibliographically approved
Rahu, I., Kull, M. & Kruve, A. (2024). Predicting the Activity of Unidentified Chemicals in Complementary Bioassays from the HRMS Data to Pinpoint Potential Endocrine Disruptors. Journal of Chemical Information and Modeling, 64(8), 3093-3104
Open this publication in new window or tab >>Predicting the Activity of Unidentified Chemicals in Complementary Bioassays from the HRMS Data to Pinpoint Potential Endocrine Disruptors
2024 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, E-ISSN 1549-960X, Vol. 64, no 8, p. 3093-3104Article in journal (Refereed) Published
Abstract [en]

The majority of chemicals detected via nontarget liquid chromatography high-resolution mass spectrometry (HRMS) in environmental samples remain unidentified, challenging the capability of existing machine learning models to pinpoint potential endocrine disruptors (EDs). Here, we predict the activity of unidentified chemicals across 12 bioassays related to EDs within the Tox21 10K dataset. Single- and multi-output models, utilizing various machine learning algorithms and molecular fingerprint features as an input, were trained for this purpose. To evaluate the models under near real-world conditions, Monte Carlo sampling was implemented for the first time. This technique enables the use of probabilistic fingerprint features derived from the experimental HRMS data with SIRIUS+CSI:FingerID as an input for models trained on true binary fingerprint features. Depending on the bioassay, the lowest false-positive rate at 90% recall ranged from 0.251 (sr.mmp, mitochondrial membrane potential) to 0.824 (nr.ar, androgen receptor), which is consistent with the trends observed in the models' performances submitted for the Tox21 Data Challenge. These findings underscore the informativeness of fingerprint features that can be compiled from HRMS in predicting the endocrine-disrupting activity. Moreover, an in-depth SHapley Additive exPlanations analysis unveiled the models' ability to pinpoint structural patterns linked to the modes of action of active chemicals. Despite the superior performance of the single-output models compared to that of the multi-output models, the latter's potential cannot be disregarded for similar tasks in the field of in silico toxicology. This study presents a significant advancement in identifying potentially toxic chemicals within complex mixtures without unambiguous identification and effectively reducing the workload for postprocessing by up to 75% in nontarget HRMS.

National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:su:diva-228594 (URN)10.1021/acs.jcim.3c02050 (DOI)001190721800001 ()38523265 (PubMedID)2-s2.0-85188780509 (Scopus ID)
Available from: 2024-04-23 Created: 2024-04-23 Last updated: 2025-02-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-9725-3351

Search in DiVA

Show all publications