Standard classification algorithms are generally designed to maximize the number of correct predictions (concordance). The criterion of maximizing the concordance may not be appropriate in certain applications. In practice, some applications may emphasize high sensitivity (e.g., clinical diagnostic tests) and others may emphasize high specificity (e.g., epidemiology screening studies). This paper considers effects of the decision threshold on sensitivity, specificity, and concordance for four classification methods: logistic regression, classification tree, Fisher's linear discriminant analysis, and a weighted k-nearest neighbor. We investigated the use of decision threshold adjustment to improve performance of either sensitivity or specificity of a classifier under specific conditions. We conducted a Monte Carlo simulation showing that as the decision threshold increases, the sensitivity decreases and the specificity increases; but, the concordance values in an interval around the maximum concordance are similar. For specified sensitivity and specificity levels, an optimal decision threshold might be determined in an interval around the maximum concordance that meets the specified requirement. Three example data sets were analyzed for illustrations.
Analysis of the cow microbiome, as well as host genetic influences on the establishment and colonization of the rumen microbiota, is critical for development of strategies to manipulate ruminal function toward more efficient and environmentally friendly milk production. To this end, the development and validation of noninvasive methods to sample the rumen microbiota at a large-scale is required. Here, we further optimized the analysis of buccal swab samples as a proxy for direct bacterial samples of the rumen of dairy cows. To identify an optimal time for sampling, we collected buccal swab and rumen samples at six different time points relative to animal feeding. We then evaluated several biases in these samples using a machine learning classifier (random forest) to select taxa that discriminate between buccal swab and rumen samples. Differences in the inverse Simpson's diversity, Shannon's evenness and Bray-Curtis dissimilarities between methods were significantly less apparent when sampling was performed prior to morning feeding (P<0.05), suggesting that this time point was optimal for representative sampling. In addition, the random forest classifier was able to accurately identify non-rumen taxa, including 10 oral and putative feed-associated taxa. Two highly prevalent (> 60%) taxa in buccal and rumen samples had significant variance in relative abundance between sampling methods, but could be qualitatively assessed via regular buccal swab sampling. This work not only provides new insights into the oral community of ruminants, but further validates and refines buccal swabbing as a method to assess the rumen bacterial in large herds. IMPORTANCE The gastrointestinal tract of ruminants harbors a diverse microbial community that coevolved symbiotically with the host, influencing its nutrition, health and performance. While the influence of environmental factors on rumen microbes is well-documented, the process by which host genetics influences the establishment and colonization of the rumen microbiota still needs to be elucidated. This knowledge gap is due largely to our inability to easily sample the rumen microbiota. There are three common methods for rumen sampling but all of them present at least one disadvantage, including animal welfare, sample quality, labor, and scalability. The development and validation of non-invasive methods, such as buccal swabbing, for large-scale rumen sampling is needed to support studies that require large sample sizes to generate reliable results. The validation of buccal swabbing will also support the development of molecular tools for the early diagnosis of metabolic disorders associated with microbial changes in large herds.
The accurate identification of low-frequency variants in tumors remains an unsolved problem. To support characterization of the issues in a realistic setting, we have developed software tools and a reference dataset for diagnosing variant calling pipelines. The dataset contains millions of variants at frequencies ranging from 0.05 to 1.0. To generate the dataset, we performed whole-genome sequencing of a mixture of two Corriel cell lines, NA19240 and NA12878, the mothers of YRI (Y) and CEU (C) HapMap trios, respectively. The cells were mixed in three different proportions, 10Y/90C, 50Y/50C and 90Y/10C, in an effort to simulate the heterogeneity found in tumor samples. We sequenced three biological replicates for each mixture, yielding approximately 1.4 billion reads per mixture for an average of 64X coverage. Using the published genotypes as our reference, we evaluate the performance of a general variant calling algorithm, constructed as a demonstration of our flexible toolset, and make comparisons to a standard GATK pipeline. We estimate the overall FDR to be 0.028 and the FNR (when coverage exceeds 20X) to be 0.019 in the 50Y/50C mixture. Interestingly, even with these relatively well studied individuals, we predict over 475,000 new variants, validating in well-behaved coding regions at a rate of 0.97, that were not included in the published genotypes.
The addition of cattle health and immunity traits to genomic selection indices holds promise to increase individual animal longevity and productivity, and decrease economic losses from disease. However, highly variable genomic loci that contain multiple immune-related genes were poorly assembled in the first iterations of the cattle reference genome assembly and underrepresented during the development of most commercial genotyping platforms. As a consequence, there is a paucity of genetic markers within these loci that may track haplotypes related to disease susceptibility. By using hierarchical assembly of bacterial artificial chromosome inserts spanning 3 of these immune-related gene regions, we were able to assemble multiple full-length haplotypes of the major histocompatibility complex, the leukocyte receptor complex, and the natural killer cell complex. Using these new assemblies and the recently released ARS-UCD1.2 reference, we aligned whole-genome shotgun reads from 125 sequenced Holstein bulls to discover candidate variants for genetic marker development. We selected 124 SNPs, using heuristic and statistical models to develop a custom genotyping panel. In a proof-of-principle study, we used this custom panel to genotype 1,797 Holstein cows exposed to bovine tuberculosis (bTB) that were the subject of a previous GWAS study using the Illumina BovineHD array. Although we did not identify any significant association of bTB phenotypes with these new genetic markers, 2 markers exhibited substantial effects on bTB phenotypic prediction. The models and parameters trained in this study serve as a guide for future marker discovery surveys particularly in previously unassembled regions of the cattle genome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.