CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Machine learning for detection and identification of emerging contaminants with non-targeted LC/ESI/HRMS screening
Stockholm University, Faculty of Science, Department of Materials and Environmental Chemistry (MMK).ORCID iD: 0000-0001-8590-4276
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Environmental water analysis remains challenging due to the complexity of the matrix and low concentrations of emerging contaminants. Liquid chromatography coupled to high-resolution mass spectrometry via an electrospray ionization source (LC/ESI/HRMS) is a powerful technique that offers high sensitivity and selectivity for detecting organic compounds. Therefore, non-targeted screening (NTS) with LC/ESI/HRMS has been widely used to detect and identify emerging contaminants in environmental water samples. Nevertheless, many aspects of detection and identification of unknown LC/ ESI/HRMS features remain challenging due to the large amount of data and lack of many analytical standards.

The main aim of this thesis is to utilize machine learning to improve the detection and identification of emerging contaminants throughout the NTS workflow, starting from sample preparation to reporting the results. In Paper I, different solid-phase extraction (SPE) cartridges were compared to aid the selection of a suitable SPE cartridge for NTS of environmental water samples. Furthermore, machine learning was used to model the SPE recoveries and the developed model was assessed and validated using an external dataset. Paper II was a study of the impact of mobile phase (pH, organic modifier, additive) and stationary phase on the LC/ESI/HRMS sensitivity towards 78 selected emerging contaminants. The results guided the selection of chromatographic conditions for LC/ESI/HRMS analysis of wastewater samples, where three selected mobile phases were further tested. Paper III describes the impact of mobile phase pH, organic modifier, additive and stationary phase on liquid chromatography retention of 78 selected emerging contaminants. Here I developed a MultiConditionRT model to predict liquid chromatography retention times in four retention mechanisms in combination with two organic modifiers, different pH-s (2.1 to 10.0), and seven additives. MultiConditionRT was validated using internal and external datasets containing 408 new compounds. In Paper IV, a new approach was developed to estimate the limit of detection (LoD) based on predicted ionization efficiency. This approach can be utilized for prioritization of unknown or tentatively identified features and to assess the detectability of chemicals with NTS methods.

Overall, this thesis illustrates how machine learning can be used to improve the detection and identification of emerging contaminants from NTS of environmental waters with LC/ESI/HRMS. In particular, the findings and data presented in this thesis offer valuable insights into the importance of accounting for the analysis conditions while improving the NTS toolbox.

Place, publisher, year, edition, pages
Stockholm: Department of Materials and Environmental Chemistry (MMK), Stockholm University , 2024. , p. 65
Keywords [en]
Predictive models, Solid-phase extraction, Limit of detection, Liquid chromatography retention times, Reporting and harmonization of NTS
National Category
Analytical Chemistry
Research subject
Analytical Chemistry
Identifiers
URN: urn:nbn:se:su:diva-232649ISBN: 978-91-8014-899-3 (print)ISBN: 978-91-8014-900-6 (electronic)OAI: oai:DiVA.org:su-232649DiVA, id: diva2:1890917
Public defence
2024-10-04, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16B, Stockholm, 10:00 (English)
Opponent
Supervisors
Available from: 2024-09-11 Created: 2024-08-21 Last updated: 2024-09-04Bibliographically approved
List of papers
1. Predicting solid-phase extraction recovery of emerging contaminants for non-targeted LC/ESI/HRMS screening of water
Open this publication in new window or tab >>Predicting solid-phase extraction recovery of emerging contaminants for non-targeted LC/ESI/HRMS screening of water
(English)Manuscript (preprint) (Other academic)
National Category
Analytical Chemistry
Research subject
Analytical Chemistry
Identifiers
urn:nbn:se:su:diva-232648 (URN)
Available from: 2024-08-21 Created: 2024-08-21 Last updated: 2024-08-21
2. Mobile phase and column chemistry selection for high sensitivity non-targeted LC/ESI/HRMS screening of water
Open this publication in new window or tab >>Mobile phase and column chemistry selection for high sensitivity non-targeted LC/ESI/HRMS screening of water
2023 (English)In: Analytica Chimica Acta, ISSN 0003-2670, E-ISSN 1873-4324, Vol. 1274, article id 341573Article in journal (Refereed) Published
Abstract [en]

Systematic selection of mobile phase and column chemistry type can be critical for achieving optimal chromatographic separation, high sensitivity, and low detection limits in liquid chromatography electrospray high resolution mass spectrometry (LC/MS). However, the selection process is challenging for non-targeted screening where the compounds of interest are not preselected nor available for method optimization. To provide general guidance, twenty different mobile phase compositions and four columns were compared for the analysis of 78 compounds with a wide range of physicochemical properties (logP range from -1.46 to 5.48), and analyte sensitivity was compared between methods. The pH, additive type, column, and organic modifier had significant effects on the analyte response factors, and acidic mobile phases (e.g. 0.1% formic acid) yielded highest sensitivity. In some cases, the effect was attributable to the difference in organic modifier content at the time of elution, depending on the mobile phase and column chemistry. Based on these findings, 0.1% formic acid, 0.1% ammonia and 5.0 mM ammonium fluoride were further evaluated for their performance in non-targeted LC/ESI/ HRMS analysis of wastewater treatment plan influent and effluent, using a data dependent MS2 acquisition and two different data processing workflows (MS-DIAL, patRoon 2.1) to compare number of detected features and sensitivity. Both data-processing workflows indicated that 0.1% formic acid yielded the highest number of features in full scan spectrum (MS1), as well as the highest number of features that triggered fragmentation spectra (MS2) when dynamic exclusion was used.

Keywords
Identification of unknowns, Method optimization, Data dependent acquisition, Limit of detection, Ionization efficiency
National Category
Analytical Chemistry
Identifiers
urn:nbn:se:su:diva-220864 (URN)10.1016/j.aca.2023.341573 (DOI)001029226900001 ()37455083 (PubMedID)2-s2.0-85163880381 (Scopus ID)
Available from: 2023-09-12 Created: 2023-09-12 Last updated: 2024-08-21Bibliographically approved
3. MultiConditionRT: Predicting liquid chromatography retention time for emerging contaminants for a wide range of eluent compositions and stationary phases
Open this publication in new window or tab >>MultiConditionRT: Predicting liquid chromatography retention time for emerging contaminants for a wide range of eluent compositions and stationary phases
Show others...
2022 (English)In: Journal of Chromatography A, ISSN 0021-9673, E-ISSN 1873-3778, Vol. 1666, article id 462867Article in journal (Refereed) Published
Abstract [en]

Structural elucidation of compounds detected with liquid chromatography coupled to high resolution mass spectrometry is a challenging and time-consuming step in the workflow of non-targeted analysis and often requires manual validation of the results. Retention time, alongside exact mass, isotope pattern, fragmentation spectra, and collision cross-section, is valuable information for ruling out unlikely structures and increasing the confidence in others. Different approaches to predict retention times have been used previously for reversed phase chromatography and hydrophilic interaction liquid chromatography (HILIC), but application is limited to a small set of mobile phases and gradient profiles. Here, we expand the toolbox available for retention time predictions by developing a random forest regression model for predicting retention times for four column types and twenty mobile phase systems. MultiConditionRT was built using a dataset containing 78 compounds analyzed with C18 reversed phase, mixed mode, HILIC, and biphenyl columns. In addition, different eluent compositions were used: both methanol and acetonitrile were combined with different aqueous phases with pH from 2.1 to 10.0 (formic acid, acetic acid, trifluoroacetic acid, formate, acetate, bicarbonate, and ammonia). The root mean square error (RMSE) of the test set predictions was 1.55 min for C18 reversed phase, 1.79 min for mixed-mode, 1.93 min for HILIC, and 1.56 min for biphenyl column. Additionally, MultiConditionRT can be applied to different gradient profiles with a general additive model-based calibration approach. The approach of MultiConditionRT was validated externally and internally with 356 and 151 compounds respectively, yielding an RMSE of 2.68 and 2.32 min. 324 and 84 of these compounds were not in the dataset used in the model development.

Keywords
High resolution mass spectrometry, Random forest regression, Gradient elution, Quantitative structure-retention relationship model
National Category
Chemical Sciences
Identifiers
urn:nbn:se:su:diva-202764 (URN)10.1016/j.chroma.2022.462867 (DOI)000756457600003 ()35139450 (PubMedID)
Available from: 2022-03-11 Created: 2022-03-11 Last updated: 2024-08-21Bibliographically approved
4. Estimating LoD-s Based on the Ionization Efficiency Values for the Reporting and Harmonization of Amenable Chemical Space in Nontargeted Screening LC/ESI/HRMS
Open this publication in new window or tab >>Estimating LoD-s Based on the Ionization Efficiency Values for the Reporting and Harmonization of Amenable Chemical Space in Nontargeted Screening LC/ESI/HRMS
2024 (English)In: Analytical Chemistry, ISSN 0003-2700, E-ISSN 1520-6882, Vol. 96, no 28, p. 11263-11272Article in journal (Refereed) Published
Abstract [en]

Nontargeted LC/ESI/HRMS aims to detect and identify organic compounds present in the environment without prior knowledge; however, in practice no LC/ESI/HRMS method is capable of detecting all chemicals, and the scope depends on the instrumental conditions. Different experimental conditions, instruments, and methods used for sample preparation and nontargeted LC/ESI/HRMS as well as different workflows for data processing may lead to challenges in communicating the results and sharing data between laboratories as well as reduced reproducibility. One of the reasons is that only a fraction of method performance characteristics can be determined for a nontargeted analysis method due to the lack of prior information and analytical standards of the chemicals present in the sample. The limit of detection (LoD) is one of the most important performance characteristics in target analysis and directly describes the detectability of a chemical. Recently, the identification and quantification in nontargeted LC/ESI/HRMS (e.g., via predicting ionization efficiency, risk scores, and retention times) have significantly improved due to employing machine learning. In this work, we hypothesize that the predicted ionization efficiency could be used to estimate LoD and thereby enable evaluating the suitability of the LC/ESI/HRMS nontargeted method for the detection of suspected chemicals even if analytical standards are lacking. For this, 221 representative compounds were selected from the NORMAN SusDat list (S0), and LoD values were determined by using 4 complementary approaches. The LoD values were correlated to ionization efficiency values predicted with previously trained random forest regression. A robust regression was then used to estimate LoD values of unknown features detected in the nontargeted screening of wastewater samples. These estimated LoD values were used for prioritization of the unknown features. Furthermore, we present LoD values for the NORMAN SusDat list with a reversed-phase C18 LC method.

National Category
Analytical Chemistry
Research subject
Analytical Chemistry
Identifiers
urn:nbn:se:su:diva-232640 (URN)10.1021/acs.analchem.4c01002 (DOI)001264266500001 ()2-s2.0-85197651002 (Scopus ID)
Available from: 2024-08-20 Created: 2024-08-20 Last updated: 2024-08-22Bibliographically approved

Open Access in DiVA

Machine learning for detection and identification of emerging contaminants with non-targeted LC/ESI/HRMS screening(2366 kB)44 downloads
File information
File name FULLTEXT01.pdfFile size 2366 kBChecksum SHA-512
7d6c35740961f2b91cc563ebae0ad1a6bff8892f6a1500cae98fac3ba20239cd227ccc293ce3d33170a5db187e19873ee3b202537bd3123d11a5802bb9c7121c
Type fulltextMimetype application/pdf

Authority records

Souihi, Amina

Search in DiVA

By author/editor
Souihi, Amina
By organisation
Department of Materials and Environmental Chemistry (MMK)
Analytical Chemistry

Search outside of DiVA

GoogleGoogle Scholar
Total: 44 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 420 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf