2023
DOI: 10.1021/acsestwater.3c00134
|View full text |Cite
|
Sign up to set email alerts
|

Prediction of 35 Target Per- and Polyfluoroalkyl Substances (PFASs) in California Groundwater Using Multilabel Semisupervised Machine Learning

Jialin Dong,
Gabriel Tsai,
Christopher I. Olivares

Abstract: Comprehensive monitoring of perfluoroalkyl and polyfluoroalkyl substances (PFASs) is challenging because of the high analytical cost and an increasing number of analytes. We developed a machine learning pipeline to understand environmental features influencing PFAS profiles in groundwater. By examining 23 public data sets (2016−2022) in California, we built a state-wide groundwater database (25,000 observations across 4200 wells) encompassing contamination sources, weather, air quality, soil, hydrology, and g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
8
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 53 publications
0
8
0
Order By: Relevance
“…The traintest split ratio is the percentage of the data that would be used to train the model to that which would be used to test it. Split such as 80:20 was used by several authors (Azhagiya Singam et al, 2020;Dong et al, 2023;Hosseinzadeh et al, 2022;McMahon et al, 2022). Hu et al, (2021) divided the domestic well PFAS data in the ratio 80:20; this is consistent with the ratio adopted by DeLuca et al, (2023) who used a 100-iteration Monte Carlo holdout scheme to split PFAS data from Columbia River Basin fish tissue.…”
Section: Implementation Details Of the Methodsmentioning
confidence: 83%
See 3 more Smart Citations
“…The traintest split ratio is the percentage of the data that would be used to train the model to that which would be used to test it. Split such as 80:20 was used by several authors (Azhagiya Singam et al, 2020;Dong et al, 2023;Hosseinzadeh et al, 2022;McMahon et al, 2022). Hu et al, (2021) divided the domestic well PFAS data in the ratio 80:20; this is consistent with the ratio adopted by DeLuca et al, (2023) who used a 100-iteration Monte Carlo holdout scheme to split PFAS data from Columbia River Basin fish tissue.…”
Section: Implementation Details Of the Methodsmentioning
confidence: 83%
“…This process is repeated several times (iteration number), changing the subset each time. Yuan et al, (2023) used a 5-fold CV, and George and Dixit, (2021) used 10 subsets (10-fold CV) to group groundwater wells to prioritize PFAS testing; the same number was used by Dong et al, (2023) for total PFAS prediction. In addition to this, Cao et al, (2023) performed 500 iterations on the training set for hyperparameter tuning.…”
Section: Implementation Details Of the Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The special issue includes several review articles encompassing a wide spectrum, ranging from a historical perspective of water data to computational modeling in wastewater treatment to ML modeling of environmental chemical reactions, environmental toxicology, heavy metal removal, and cyanobacterial harmful algal blooms (HABs) . One significant application of these innovative tools is ML-assisted environmental monitoring, which can address diverse problems, such as predicting effluent nutrients or influent flow rates and nutrient loads at wastewater treatment plants, , formation of disinfection byproducts, drivers of the accumulation of potentially toxic elements in sediments, greenhouse gas emissions, , occurrence of PFAS, water quality assessment, microplastics, microcystins, and differentiation of landfill leachate and domestic sludge . ML has also been extensively employed to model environmental chemical reactions and processes, including adsorption onto various materials, , biodegradation, photodegradation, and the physicochemical and meteorological variables that affect the seasonal growth and decline of HABs .…”
mentioning
confidence: 99%