Abstract:Linear regression models are traditionally used to capture the relation between the input and output variables. Linear models cannot account for the nonlinear relations in the data. Hence, the prediction models may not be accurate. For this reason, machine learning-based models are being increasingly used. For modeling, design, and scaleup of rotating disc contactors (RDCs), rational estimation of dispersed-phase holdup and drop size is crucial. We have employed random forest (RF) and autoencoder−RF-based mode… Show more
“…Since, we have not generated any new data in our work, we have adopted the definition of the drop size as per the available literature. More comprehensive information on the experimental data and the empirical correlations is made available as Supporting Information in our previous work 2 and is not reproduced here for the sake of brevity. The Supporting Information can be accessed at https://pubs.acs.org/doi/10.1021/acs.iecr.0c04149…”
Section: Brief Overview Of Existing Work On Drop Sizementioning
confidence: 99%
“…The empirical correlations and the AARE values can be found in the Supporting Information of our previous work. 2 The random forest model with a 5-fold cross-validation was developed as part of our previous work. 2 Machine learning models have been widely employed in chemical engineering for estimation of bubble size and holdup in a bubble column, flow regime identification, estimation of mass transport coefficient, etc.…”
Section: Brief Overview Of Existing Work On Drop Sizementioning
confidence: 99%
“…However, it was observed that the linear regression model gave a poor prediction performance as R 2 and AARE for the test set were found to be 0.6230 and 27.21%, respectively. 2 The random forest model based on random forest models with top features was developed. The prediction performance of the random forest for a test drop size data set was found to be R 2 = 0.8725 and AARE = 15.7946%.…”
Section: Brief Overview Of Existing Work On Drop Sizementioning
Drop size is a crucial parameter
for the efficient design and operation
of the rotating disc contactor (RDC) in liquid–liquid extraction.
The current work focuses on providing local and global explanations
for the prediction of the drop size in a rotating disc contactor (RDC).
The Random Forest (RF) regression model is a robust machine learning
algorithm that can accurately capture complex relationships in the
data. However, the interpretability of the model is limited. In order
to address the issue of interpretability of the developed RF model,
in the current work, we employed Local Interpretable Model-Agnostic
Explanations (LIME) of the predictions of the RF model. This provides
both local and global views of the model and thereby helps one to
gain insights into the factors influencing predictions. We have provided
local explanations depicting the impact of different attributes on
the prediction of the output for any given input example. We have
also obtained global feature importance, providing the top subset
of informative attributes. We have also developed local surrogate
models incorporating second order attribute interactions. This has
provided important information about the effect of interactions on
the drop size prediction. By augmenting the random forest model with
LIME, it is possible to develop a more accurate and interpretable
model for estimating the drop size in RDCs, ultimately leading to
improved performance and efficiency.
“…Since, we have not generated any new data in our work, we have adopted the definition of the drop size as per the available literature. More comprehensive information on the experimental data and the empirical correlations is made available as Supporting Information in our previous work 2 and is not reproduced here for the sake of brevity. The Supporting Information can be accessed at https://pubs.acs.org/doi/10.1021/acs.iecr.0c04149…”
Section: Brief Overview Of Existing Work On Drop Sizementioning
confidence: 99%
“…The empirical correlations and the AARE values can be found in the Supporting Information of our previous work. 2 The random forest model with a 5-fold cross-validation was developed as part of our previous work. 2 Machine learning models have been widely employed in chemical engineering for estimation of bubble size and holdup in a bubble column, flow regime identification, estimation of mass transport coefficient, etc.…”
Section: Brief Overview Of Existing Work On Drop Sizementioning
confidence: 99%
“…However, it was observed that the linear regression model gave a poor prediction performance as R 2 and AARE for the test set were found to be 0.6230 and 27.21%, respectively. 2 The random forest model based on random forest models with top features was developed. The prediction performance of the random forest for a test drop size data set was found to be R 2 = 0.8725 and AARE = 15.7946%.…”
Section: Brief Overview Of Existing Work On Drop Sizementioning
Drop size is a crucial parameter
for the efficient design and operation
of the rotating disc contactor (RDC) in liquid–liquid extraction.
The current work focuses on providing local and global explanations
for the prediction of the drop size in a rotating disc contactor (RDC).
The Random Forest (RF) regression model is a robust machine learning
algorithm that can accurately capture complex relationships in the
data. However, the interpretability of the model is limited. In order
to address the issue of interpretability of the developed RF model,
in the current work, we employed Local Interpretable Model-Agnostic
Explanations (LIME) of the predictions of the RF model. This provides
both local and global views of the model and thereby helps one to
gain insights into the factors influencing predictions. We have provided
local explanations depicting the impact of different attributes on
the prediction of the output for any given input example. We have
also obtained global feature importance, providing the top subset
of informative attributes. We have also developed local surrogate
models incorporating second order attribute interactions. This has
provided important information about the effect of interactions on
the drop size prediction. By augmenting the random forest model with
LIME, it is possible to develop a more accurate and interpretable
model for estimating the drop size in RDCs, ultimately leading to
improved performance and efficiency.
“…Persistent lack of physical comprehension continuously stymies preferable prediction performance of the key parameters in multiphase flow and reactor systems, although scientists have made systematic contributions to experimentally formulated correlations throughout the past decades. − The correlations of the key parameters in multiphase units are commonly expressed by gas/liquid/solid phase properties, operating conditions (e.g., phase concentration, velocity, and temperature), devices configurations (e.g., height and diameter), or a combination of them in dimensionless forms such as Archimedes, Froude, Nusselt, Reynolds, Sherwood, and Weber numbers. However, the prediction discrepancies between the existing empirical correlations of key parameters such as the particle entrainment and minimum fluidization velocity in gas-particle riser flows can reach several orders of magnitude. , Fortunately, the advanced research and development of flexible ML tools have the potential to complement the incomplete knowledge to boost the prediction ability of key multiphase field parameters such as mass flow rate/flux, − minimum fluidization velocity, , mixing rate/index, , overall/local hold-up, − pressure/pressure drop, − velocity, ,− temperature, − and other parameters − in multiphase/particulate flows and reactors.…”
Artificial intelligence (AI), machine
learning (ML), and data science
are leading to a promising transformative paradigm. ML, especially
deep learning and physics-informed ML, is a valuable toolkit that
complements incomplete domain-specific knowledge in conventional experimental
and computational methods. ML can provide flexible techniques to facilitate
the conceptual development of new robust predictive models for multiphase
flows and reactors by finding hidden pattern/information/mechanism
in a data set. Due to such emergence, we thereby comprehensively survey,
explore, analyze, and discuss key advancements of recent ML applications
to hydrodynamics, heat and mass transfer, and reactions in single-phase
and multiphase flow systems from different aspects: (1) development
of multiphase closure models of drag force, turbulence stresses and
heat/mass transfer to improve the accuracy and efficiency of typical
CFD simulations; (2) image reconstruction, regime identification,
key parameter predictions, and optimization of multiphase flow and
transport fields; (3) reaction kinetics modeling (e.g., predictions
of reaction networks, kinetic parameters, and species production)
and reaction condition optimization. These sections also discuss and
analyze the key advantages and weakness of ML for solving the problems
in the domain of multiphase flows and reactors. Finally, we summarize
the under-solving challenges and opportunities in order to identify
future directions that would be useful for the research community.
Future development and study of multiphase flows and reactors are
envisaged to be accelerated by ML and data science.
“…Advanced machine learning methods expand the application scope of the QSPR model. − In recent years, ensemble learning methods, especially random forest (RF) and light gradient boosting machine (LightGBM), have yielded satisfactory results in dissolution prediction. , In 2021, Ye et al predicted the solubility of compounds in organic solvents with the LightGBM algorithm, which showed better generalization ability compared to deep learning and other traditional machine learning algorithms.…”
Aqueous solubility is one of the most important physicochemical properties in drug discovery. At present, the prediction of aqueous solubility of compounds is still a challenging problem. Machine learning has shown great potential in solubility prediction. Most machine learning models largely rely on the setting of hyperparameters, and their performance can be improved by setting the hyperparameters in a better way. In this paper, we used MACCS fingerprints to represent the structural features and optimized the hyperparameters of the light gradient boosting machine (LightGBM) with the cuckoo search algorithm (CS). Based on the above representation and optimization, the CS-LightGBM model was established to predict the aqueous solubility of 2446 organic compounds and the obtained prediction results were compared with those obtained with the other six different machine learning models (RF, GBDT, XGBoost, LightGBM, SVR, and BO-LightGBM). The comparison results showed that the CS-LightGBM model had a better prediction performance than the other six different models. RMSE, MAE, and R 2 of the CS-LightGBM model were, respectively, 0.7785, 0.5117, and 0.8575. In addition, this model has good scalability and can be used to solve solubility prediction problems in other fields such as solvent selection and drug screening.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.