2019
DOI: 10.1371/journal.pone.0223517
|View full text |Cite
|
Sign up to set email alerts
|

Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization

Abstract: A detailed characterization of the chemical composition of complex substances, such as products of petroleum refining and environmental mixtures, is greatly needed in exposure assessment and manufacturing. The inherent complexity and variability in the composition of complex substances obfuscate the choices for their detailed analytical characterization. Yet, in lieu of exact chemical composition of complex substances, evaluation of the degree of similarity is a sensible path toward decision-making in environm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 23 publications
(23 citation statements)
references
References 36 publications
0
23
0
Order By: Relevance
“…To evaluate the outcome of such grouping, we included a quantitative metric into the unsupervised analysis workflow to assess the correspondence of the outcome to the original categories of each chemical. The details of the unsupervised analysis workflow are described elsewhere ( Onel et al, 2019 ). Briefly, clustering was performed using the hclust function in R, using average linkage clustering applied to a Euclidean distance metric on centered, scaled data (essentially Pearson correlation), which we have previously found to be reasonably robust ( Onel et al, 2019 ).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…To evaluate the outcome of such grouping, we included a quantitative metric into the unsupervised analysis workflow to assess the correspondence of the outcome to the original categories of each chemical. The details of the unsupervised analysis workflow are described elsewhere ( Onel et al, 2019 ). Briefly, clustering was performed using the hclust function in R, using average linkage clustering applied to a Euclidean distance metric on centered, scaled data (essentially Pearson correlation), which we have previously found to be reasonably robust ( Onel et al, 2019 ).…”
Section: Methodsmentioning
confidence: 99%
“…The details of the unsupervised analysis workflow are described elsewhere ( Onel et al, 2019 ). Briefly, clustering was performed using the hclust function in R, using average linkage clustering applied to a Euclidean distance metric on centered, scaled data (essentially Pearson correlation), which we have previously found to be reasonably robust ( Onel et al, 2019 ). The Fowlkes-Mallows (FM) index ( Fowlkes and Mallows, 1983 ), a measure of similarity of two clusters, was calculated to enable quantitative comparative assessment between groupings achieved using each dataset to the known chemical categories.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In machine learning, SVMs are extensively used for classification and regression-type of analyses, spanning over several different application areas including but not limited to fault detection and diagnosis, [30][31][32] improvement of process operations, 33 and predictive modeling of complex substances. 34,35 In this work, we use an SVM model to mimic the implicit constraint that defines the feasibility of the solution of a DAE system. Specifically, we build an SVM-based classification model in the offline phase by using a dataset of simulated samples with their binary outcome (feasible/infeasible).…”
Section: Modeling Implicit Constraints With Svmsmentioning
confidence: 99%
“…In this work, the training set is a balanced subset of the active compounds that contain both agonist and antagonist chemicals and their corresponding 4 technical replicates, whereas the testing set is the remaining unseen active compounds with their corresponding 4 technical replicates, not used in the training phase. Supervised learning algorithms are widely studied in many fields of engineering and sciences primarily in classification and regression-type problems for predicting either a categorical output or a continuous output, respectively [24,[26][27][28][29][30][31][32]. Classification is the problem of finding the categorical output of a new observation and distinguishing between different classes of information via statistical recognition of patterns in a training data set.…”
Section: Plos Computational Biologymentioning
confidence: 99%