Seafloor properties, including total organic carbon (TOC), are sparsely measured on a global scale, and interpolation (prediction) techniques are often used as a proxy for observation. Previous geospatial interpolations of seafloor TOC exhibit gaps where little to no observed data exists. In contrast, recent machine learning techniques, relying on geophysical and geochemical properties (e.g., seafloor biomass, porosity, and distance from coast), show promise in making comprehensive, statistically optimal predictions. Here we apply a nonparametric (i.e., data‐driven) machine learning algorithm, specifically k‐nearest neighbors (kNN), to estimate the global distribution of seafloor TOC. Our results include predictor (feature) selection specifically designed to mitigate bias and produce a statistically optimal estimation of seafloor TOC, with uncertainty, at 5 × 5‐arc minute resolution. Analysis of parameter space sample density provides a guide for future sampling. One use for this prediction is to constrain a global inventory, indicating that just the upper 5 cm of the seafloor contains about 87 ± 43 gigatons of carbon (Gt C) in organic form.
A newly compiled, open-source database of focused fluid flow sites (e.g., cold seeps) and associated SEAfloor FLuid Expulsion Anomalies (SEAFLEAs) reveals a variable distribution of anomalies across global continental margins. The SEAFLEA distribution is heavily biased toward North American continental margins, with most observations between 100-and 200-m water depth globally, and with an equal distribution between active and passive margins. Using a machine learning classification methodology based on outlier detection algorithms, we predict the probability of encountering a SEAFLEA globally. Results show the highest probability in regions with multiple SEAFLEA observations and parametrically similar regions concentrated on continental margins. In general, geologic, biologic, and chemical predictors are the best predictors of SEAFLEAs. We validate our results using a random and geospatial validation technique that reveals our methods are robust to random variations in observations, but that certain margins, such as the Svalbard Margin, represent parametrically distinct locals. These distinct regions have control over the global distribution of predicted anomalies due to their unique features. Our final prediction on a global 5 × 5 arc minute grid reveals that the average probability of encountering a SEAFLEA is 33.1 ± 17.7% on active margins and 31.2 ± 18.9% on passive margins, showing equal likelihood of encountering fluid expulsion between passive and active margins. Therefore, the lateral compaction on active margins does not increase the likelihood of fluid expulsion relative to the predominantly vertical compaction on passive margins. These results however say nothing of the fluid flux rates or density of expulsion features.
Indian rhesus macaque major histocompatibility complex (MHC) variation can influence the outcomes of transplantation and infectious disease studies. Frequently, rhesus macaques are MHC genotyped to identify variants that could account for unexpected results. Since the MHC is only one region in the genome where variation could impact experimental outcomes, strategies for simultaneously profiling variation in the macaque MHC and the remainder of the protein coding genome would be useful. Here we introduce macaque exome sequence (MES) genotyping, in which MHC class I and class II genotypes are determined with high confidence using targetenrichment probes that are enriched for MHC sequences. For a cohort of 27 Indian rhesus macaques, we describe two methods for obtaining MHC genotypes from MES data and demonstrate that the MHC class I and class II genotyping results obtained with these methods are 98.1% and 98.7% concordant, respectively, with expected MHC genotypes. In contrast, conventional MHC genotyping results obtained by deep sequencing of short multiplex PCR amplicons were only 92.6% concordant with expectations for this cohort.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.