A lithology intelligent identification interpretability model is proposed based on Ensemble Learning Stacking, Permutation Importance (PI) and Local Interpretable Model-agnostic Explanations (LIME). The method aiming to provide more accurate geological information and more scientific theoretical support for oil and gas resource exploration. Two logging datasets from the public domain were used as experiments, and support vector machine (SVM), random forest (RF) and naive bayes (NB) were used as primary learners, and SVM as secondary learners, to classify lithology through stacking algorithm. Then, the evaluation indexes such as Area Under Curve (AUC), precision, recall and F1-score were used to verify its accuracy, and PI and LIME were used to explain the lithology identification model. The study shows that the results of the stacking algorithm have the best indexes and the highest prediction accuracy. In terms of overall interpretation, PHIND, GR and RT have the most influence on lithology identification of a natural gas protection area in the United States; DEN, CAL and PEF have the most influence on lithology identification in Daqing Oilfield in China. Interpreted from the perspective of a single sample, the LIME algorithm is able to give a quantitative prediction probability and the degree of influence of the characteristic variables.
A lithology intelligent identi cation interpretability model is proposed based on Ensemble LearningStacking, Permutation Importance (PI) and Local Interpretable Model-agnostic Explanations (LIME). The method aiming to provide more accurate geological information and more scienti c theoretical support for oil and gas resource exploration. Two logging datasets from the public domain were used as experiments, and support vector machine (SVM), random forest (RF) and naive bayes (NB) were used as primary learners, and SVM as secondary learners, to classify lithology through stacking algorithm. Then, the evaluation indexes such as Area Under Curve (AUC), precision, recall and F1-score were used to verify its accuracy, and PI and LIME were used to explain the lithology identi cation model. The study shows that the results of the stacking algorithm have the best indexes and the highest prediction accuracy. In terms of overall interpretation, PHIND, GR and RT have the most in uence on lithology identi cation of a natural gas protection area in the United States; DEN, CAL and PEF have the most in uence on lithology identi cation in Daqing Oil eld in China. Interpreted from the perspective of a single sample, the LIME algorithm is able to give a quantitative prediction probability and the degree of in uence of the characteristic variables.
Lithologic identification plays a crucial role in petroleum geological exploration, and machine learning has become increasingly prevalent in intelligent lithology identification in recent years. However, identifying lithologies presents challenges due to a lack of lithologic labels and an imbalanced distribution of lithologies. To address this issue and obtain satisfactory lithological identification results, this study investigates a class-rebalancing self-training lithology identification framework. This framework employs logging data and limited lithological labels as input, and achieves promising lithology classification through the Class Rebalancing Self-Training (CReST) approach. Four machine learning algorithms with high overall performance are selected from 25 common algorithms to establish CReST models, including Bagging Classifier, Extra Trees Classifier, Random Forest Classifier, and Support Vector Classifier. The classification results of the models are compared and analyzed under three conditions. The experimental findings indicate that (1) under label scarcity, the effect of category recognition varies greatly with different sample numbers; (2) under self-training, overall performance is improved, but the difference in performance caused by category imbalance also increases; and (3) under the CReST framework, the model effectively resolves the identification problems caused by a lack of labels and an imbalanced category distribution. Specifically, the precision of identifying categories with fewer samples is improved by more than 20%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.