An Exploratory Data Analysis (EDA) aims to use Synthetic Aperture Radar (SAR) measurements for discriminating between two oil slick types observed on the sea surface: naturally-occurring oil seeps versus human-related oil spills-the use of satellite sensors for this task is poorly documented in scientific literature. A long-term RADARSAT dataset (2008)(2009)(2010)(2011)(2012)) is exploited to investigate oil slicks in Campeche Bay (Gulf of Mexico). Simple Classification Algorithms to distinguish the oil slick type are designed based on standard multivariate data analysis techniques. Various attributes of geometry, shape, and dimension that describe the oil slick Size Information are combined with SAR-derived backscatter coefficients-sigma-(σ o ), beta-(β o ), and gamma-(γ o ) naught. The combination of several of these characteristics is capable of distinguishing the oil slick type with~70% of overall accuracy, however, the sole and simple use of two specific oil slick's Size Information (i.e., area and perimeter) is equally capable of distinguishing seeps from spills. The data mining exercise of our EDA promotes a novel idea bridging petroleum pollution and remote sensing research, thus paving the way to further investigate the satellite synoptic view to express geophysical differences between seeped and spilled oil observed on the sea surface for systematic use.
Our research focuses on refining the ability to discriminate two petrogenic oil-slick categories: the sea surface expression of naturally-occurring oil seeps and man-made oil spills. For that, a long-term RADARSAT-2 dataset (244 scenes imaged between 2008 and 2012) is analyzed to investigate oil slicks (4562) observed in the Gulf of Mexico (Campeche Bay, Mexico). As the scientific literature on the use of satellite-derived measurements to discriminate the oil-slick category is sparse, our research addresses this gap by extending our previous investigations aimed at discriminating seeps from spills. To reveal hidden traits of the available satellite information and to evaluate an existing Oil-Slick Discrimination Algorithm, distinct processing segments methodically inspect the data at several levels: input data repository, data transformation, attribute selection, and multivariate data analysis. Different attribute selection strategies similarly excel at the seep-spill differentiation. The combination of different Oil-Slick Information Descriptors presents comparable discrimination accuracies. Among 8 non-linear transformations, the Logarithm and Cube Root normalizations disclose the most effective discrimination power of almost 70%. Our refined analysis corroborates and consolidates our earlier findings, providing a firmer basis and useful accuracies of the seep-spill discrimination practice using information acquired with space-borne surveillance systems based on Synthetic Aperture Radars.
We classify low-backscatter regions observed in Synthetic Aperture Radar (SAR) measurements of the surface of the ocean as either oil slicks or look-alike slicks (radar false targets). Our proposed classification algorithm is based on Linear Discriminant Analyses (LDAs) of RADARSAT-1 measurements (402 scenes off the southeast coast of Brazil from July 2001 to June 2003) and Meteorological-Oceanographic (MetOc) data from other earth observation sensors: Advanced Very High Resolution Radiometer (AVHRR), Sea-Viewing Wide Field-of-View Sensor (SeaWiFS), Moderate Resolution Imaging Spectroradiometer (MODIS), and Quick Scatterometer (QuikSCAT). Oil slicks are sea-surface expressions of exploration and production oil, ship- and orphan-spills. False targets are associated with environmental phenomena, such as biogenic films, algal blooms, upwelling, low wind, or rain cells. Both categories have been interpreted by domain-experts: mineral oil (n = 350; 45.5%) and petroleum free (n = 419; 54.5%). We explore nine size variables (area, perimeter, etc.) and three types of MetOc information (sea surface temperature, chlorophyll-a, and wind speed) that describe the 769 samples analyzed. Seven attribute–domain combinations are tested with three non-linear transformations (none, cube root, log10), with and without MetOc, adding to 39 attribute subdivisions. Classification accuracies are independent of data transformation and improve when selected size attributes are combined with MetOc, leading to overall accuracies of ~80% and sound levels of sensitivity (~90%), specificity (~80%), positive (~80%) and negative (~90%) predictive values. The effectiveness of this data-driven attempt supports further commercial or academic implementation of our LDA algorithm.
A novel empirical approach to categorize oil slicks’ sea surface expressions in synthetic aperture radar (SAR) measurements into oil seeps or oil spills is investigated, contributing both to academic remote sensing research and to practical applications for the petroleum industry. We use linear discriminant analysis (LDA) to try accuracy improvements from our previously published methods of discriminating seeps from spills that achieved ~70% of overall accuracy. Analyzing 244 RADARSAT-2 scenes containing 4562 slicks observed in Campeche Bay (Gulf of Mexico), our exploratory data analysis evaluates the impact of 61 combinations of SAR backscatter coefficients (σ°, β°, γ°), SAR calibrated products (received radar beam given in amplitude or decibel, with or without a despeckle filter), and data transformations (none, cube root, log10). The LDA ability to discriminate the oil-slick category is rather independent of backscatter coefficients and calibrated products, but influenced by data transformations. The combination of attributes plays a role in the discrimination; combining oil-slicks’ size and SAR information is more effective. We have simplified our analyses using fewer attributes to reach accuracies comparable to those of our earlier studies, and we suggest using other multivariate data analyses—cubist or random forest—to attempt to further improve oil-slick category discrimination.
Linear discriminant analysis (LDA) is a mathematically robust multivariate data analysis approach that is sometimes used for surface oil slick signature classification. Our goal is to rank the effectiveness of LDAs to differentiate oil spills from look-alike slicks. We explored multiple combinations of (i) variables (size information, Meteorological-Oceanographic (metoc), geo-location parameters) and (ii) data transformations (non-transformed, cube root, log10). Active and passive satellite-based measurements of RADARSAT, QuikSCAT, AVHRR, SeaWiFS, and MODIS were used. Results from two experiments are reported and discussed: (i) an investigation of 60 combinations of several attributes subjected to the same data transformation and (ii) a survey of 54 other data combinations of three selected variables subjected to different data transformations. In Experiment 1, the best discrimination was reached using ten cube-transformed attributes: ~85% overall accuracy using six pieces of size information, three metoc variables, and one geo-location parameter. In Experiment 2, two combinations of three variables tied as the most effective: ~81% of overall accuracy using area (log transformed), length-to-width ratio (log- or cube-transformed), and number of feature parts (non-transformed). After verifying the classification accuracy of 114 algorithms by comparing with expert interpretations, we concluded that applying different data transformations and accounting for metoc and geo-location attributes optimizes the accuracies of binary classifiers (oil spill vs. look-alike slicks) using the simple LDA technique.
Sea-surface petroleum pollution is observed as “oil slicks” (i.e., “oil spills” or “oil seeps”) and can be confused with “look-alike slicks” (i.e., environmental phenomena, such as low-wind speed, upwelling conditions, chlorophyll, etc.) in synthetic aperture radar (SAR) measurements, the most proficient satellite sensor to detect mineral oil on the sea surface. Even though machine learning (ML) has become widely used to classify remotely-sensed petroleum signatures, few papers have been published comparing various ML methods to distinguish spills from look-alikes. Our research fills this gap by comparing and evaluating six traditional techniques: simple (naive Bayes (NB), K-nearest neighbor (KNN), decision trees (DT)) and advanced (random forest (RF), support vector machine (SVM), artificial neural network (ANN)) applied to different combinations of satellite-retrieved attributes. 36 ML algorithms were used to discriminate “ocean-slick signatures” (spills versus look-alikes) with ten-times repeated random subsampling cross validation (70-30 train-test partition). Our results found that the best algorithm (ANN: 90%) was >20% more effective than the least accurate one (DT: ~68%). Our empirical ML observations contribute to both scientific ocean remote-sensing research and to oil and gas industry activities, in that: (i) most techniques were superior when morphological information and Meteorological and Oceanographic (MetOc) parameters were included together, and less accurate when these variables were used separately; (ii) the algorithms with the better performance used more variables (without feature selection), while lower accuracy algorithms were those that used fewer variables (with feature selection); (iii) we created algorithms more effective than those of benchmark-past studies that used linear discriminant analysis (LDA: ~85%) on the same dataset; and (iv) accurate algorithms can assist in finding new offshore fossil fuel discoveries (i.e., misclassification reduction).
The paper introduces the Oil-Slick Hub (OSH), a computational platform to facilitate the data visualization of a large database of petroleum signatures observed on the surface of the ocean with synthetic aperture radar (SAR) measurements. This Internet platform offers an information search and retrieval system of a database resulting from >20 years of scientific projects that interpreted ~15 thousand offshore mineral oil “slicks”: natural oil “seeps” versus operational oil “spills”. Such a Digital Mega-Collection Database consists of satellite images and oil-slick polygons identified in the Gulf of Mexico (GMex) and the Brazilian Continental Margin (BCM). A series of attributes describing the interpreted slicks are also included, along with technical reports and scientific papers. Two experiments illustrate the use of the OSH to facilitate the selection of data subsets from the mega collection (GMex variables and BCM samples), in which artificial intelligence techniques—machine learning (ML)—classify slicks into seeps or spills. The GMex variable dataset was analyzed with simple linear discriminant analyses (LDAs), and a three-fold accuracy performance pattern was observed: (i) the least accurate subset (~65%) solely used acquisition aspects (e.g., acquisition beam mode, date, and time, satellite name, etc.); (ii) the best results (>90%) were achieved with the inclusion of location attributes (i.e., latitude, longitude, and bathymetry); and (iii) moderate performances (~70%) were reached using only morphological information (e.g., area, perimeter, perimeter to area ratio, etc.). The BCM sample dataset was analyzed with six traditional ML methods, namely naive Bayes (NB), K-nearest neighbors (KNN), decision trees (DT), random forests (RF), support vector machines (SVM), and artificial neural networks (ANN), and the most effective algorithms per sample subsets were: (i) RF (86.8%) for Campos, Santos, and Ceará Basins; (ii) NB (87.2%) for Campos with Santos Basins; (iii) SVM (86.9%) for Campos with Ceará Basins; and (iv) SVM (87.8%) for only Campos Basin. The OSH can assist in different concerns (general public, social, economic, political, ecological, and scientific) related to petroleum exploration and production activities, serving as an important aid in discovering new offshore exploratory frontiers, avoiding legal penalties on oil-seep events, supporting oceanic monitoring systems, and providing valuable information to environmental studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.