Total Organic Carbon Content Prediction in Lacustrine Shale Using Extreme Gradient Boosting Machine Learning Based on Bayesian Optimization

Liu, Xingzhou; Tian, Zhi; Chen, Chang

doi:10.1155/2021/6155663

Cited by 10 publications

(3 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The calculation efficiency of the XGBoost model generally decreases with increasing numbers of independent variables in the sample. When considering more independent variables, several variables are randomly selected to reorganize the learning samples so that the model can quickly process smaller learning samples (Liu et al, 2021). XGBoost avoids overfitting with high probability during the training process, thereby ensuring its reliability.…”

Section: Selection Of Machine Learning Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Forecast of lacustrine shale lithofacies types in continental rift basins based on machine learning: A case study from Dongying Sag, Jiyang Depression, Bohai Bay Basin, China

2023

View full text Add to dashboard Cite

Lacustrine shale in continental rift basins is complex and features a variety of mineralogical compositions and microstructures. The lithofacies type of shale, mainly determined by mineralogical composition and microstructure, is the most critical factor controlling the quality of shale oil reservoirs. Conventional geophysical methods cannot accurately forecast lacustrine shale lithofacies types, thus restricting the progress of shale oil exploration and development. Considering the lacustrine shale in the upper Es4 member of the Dongying Sag in the Jiyang Depression, Bohai Bay Basin, China, as the research object, the lithofacies type was forecast based on two machine learning methods: support vector machine (SVM) and extreme gradient boosting (XGBoost). To improve the forecast accuracy, we applied the following approaches: first, using core and thin section analyses of consecutively cored wells, the lithofacies were finely reclassified into 22 types according to mineralogical composition and microstructure, and the vertical change of lithofacies types was obtained. Second, in addition to commonly used well logging data, paleoenvironment parameter data (Rb/Sr ratio, paleoclimate parameter; Sr %, paleosalinity parameter; Ti %, paleoprovenance parameter; Fe/Mn ratio, paleo-water depth parameter; P/Ti ratio, paleoproductivity parameter) were applied to the forecast. Third, two sample extraction modes, namely, curve shape-to-points and point-to-point, were used in the machine learning process. Finally, the lithofacies type forecast was carried out under six different conditions. In the condition of selecting the curved shape-to-point sample extraction mode and inputting both well logging and paleoenvironment parameter data, the SVM method achieved the highest average forecast accuracy for all lithofacies types, reaching 68%, as well as the highest average forecast accuracy for favorable lithofacies types at 98%. The forecast accuracy for all lithofacies types improved by 7%–28% by using both well logging and paleoenvironment parameter data rather than using one or the other, and was 7%–8% higher by using the curve shape-to-point sample extraction mode compared to the point-to-point sample extraction mode. In addition, the learning sample quantity and data value overlap of different lithofacies types affected the forecast accuracy. The results of our study confirm that machine learning is an effective solution to forecast lacustrine shale lithofacies. When adopting machine learning methods, increasing the learning sample quantity (>45 groups), selecting the curve shape-to-point sample extraction mode, and using both well logging and paleoenvironment parameter data are effective ways to improve the forecast accuracy of lacustrine shale lithofacies types. The method and results of this study provide guidance to accurately forecast the lacustrine shale lithofacies types in new shale oil wells and will promote the harvest of lacustrine shale oil globally.

show abstract

Section: Selection Of Machine Learning Methodsmentioning

confidence: 99%

“…For each step, the loss function values must be calculated, and the objective function to obtain f(x) must be optimized. Finally, an optimal ensemble model is obtained based on the additive method (Liu et al, 2021). K-fold cross-validation was selected to optimize the model parameters in the present study.…”

Section: Figurementioning

confidence: 99%

Forecast of lacustrine shale lithofacies types in continental rift basins based on machine learning: A case study from Dongying Sag, Jiyang Depression, Bohai Bay Basin, China

2023

View full text Add to dashboard Cite

show abstract

“…To overcome the drawbacks of the current mathematical regression methods, the machine learning technique has been recently introduced for predicting TOC content [18,[20][21][22][23][24][25][26][27][28][29][30][31][32]. In these published works, several versions of machine learning models have been developed for TOC content estimation or other properties, including Bayesian regression, random forest (RF), fuzzy logic, neural network, support vector regression (SVR), decision tree and XGBoost, among others.…”

Section: Introductionmentioning

confidence: 99%

Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost

et al. 2023

View full text Add to dashboard Cite

The total organic carbon (TOC) content of organic-rich shale is a key parameter in screening for potential source rocks and sweet spots of shale oil/gas. Traditional methods of determining the TOC content, such as the geochemical experiments and the empirical mathematical regression method, are either high cost and low-efficiency, or universally non-applicable and low-accuracy. In this study, we propose three machine learning models of random forest (RF), support vector regression (SVR), and XGBoost to predict the TOC content using well logs, and the performance of each model are compared with the traditional empirical methods. First, the decision tree algorithm is used to identify the optimal set of well logs from a total of 15. Then, 816 data points of well logs and the TOC content data collected from five different shale formations are used to train and test these three models. Finally, the accuracy of three models is validated by predicting the unknown TOC content data from a shale oil well. The results show that the RF model provides the best prediction for the TOC content, with R2 = 0.915, MSE = 0.108, and MAE = 0.252, followed by the XGBoost, while the SVR gives the lowest predictive accuracy. Nevertheless, all three machine learning models outperform the traditional empirical methods such as Schmoker gamma-ray log method, multiple linear regression method and ΔlgR method. Overall, the proposed machine learning models are powerful tools for predicting the TOC content of shale and improving the oil/gas exploration efficiency in a different formation or a different basin.

show abstract

An approach for total organic carbon prediction using convolutional neural networks optimized by differential evolution

Silva¹,

Saporetti²,

Yaseen‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬³

et al. 2023

Neural Comput & Applic

View full text Add to dashboard Cite

Total Organic Carbon Content Prediction in Lacustrine Shale Using Extreme Gradient Boosting Machine Learning Based on Bayesian Optimization

Cited by 10 publications

References 33 publications

Forecast of lacustrine shale lithofacies types in continental rift basins based on machine learning: A case study from Dongying Sag, Jiyang Depression, Bohai Bay Basin, China

Forecast of lacustrine shale lithofacies types in continental rift basins based on machine learning: A case study from Dongying Sag, Jiyang Depression, Bohai Bay Basin, China

Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost

An approach for total organic carbon prediction using convolutional neural networks optimized by differential evolution

Contact Info

Product

Resources

About