We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where r is the number of SNPs with P-values within the Bonferroni correction, we find that both improve the ranks of causal variants and associated regions and achieve higher power on simulated data. These improvements, however, as well as stability of the SVM and RF rankings, progressively decrease as the cutoff increases to 5r and 10r. As applications we compare the ranks of previously replicated SNPs in real data, associated regions in type 1 diabetes, as provided by the Type 1 Diabetes Consortium, and disease risk prediction accuracies as given by top ranked SNPs by the three methods. Software and webserver are available at http://svmsnps.njit.edu.
The integration of rapid assays, large datasets, informatics, and modeling can overcome current barriers in understanding nanomaterial structure–toxicity relationships by providing a weight-of-the-evidence mechanism to generate hazard rankings for nanomaterials. Here, we present the use of a rapid, low-cost assay to perform screening-level toxicity evaluations of nanomaterials in vivo. Calculated EZ Metric scores, a combined measure of morbidity and mortality in developing embryonic zebrafish, were established at realistic exposure levels and used to develop a hazard ranking of diverse nanomaterial toxicity. Hazard ranking and clustering analysis of 68 diverse nanomaterials revealed distinct patterns of toxicity related to both the core composition and outermost surface chemistry of nanomaterials. The resulting clusters guided the development of a surface chemistry-based model of gold nanoparticle toxicity. Our findings suggest that risk assessments based on the size and core composition of nanomaterials alone may be wholly inappropriate, especially when considering complex engineered nanomaterials. Research should continue to focus on methodologies for determining nanomaterial hazard based on multiple sub-lethal responses following realistic, low-dose exposures, thus increasing the availability of quantitative measures of nanomaterial hazard to support the development of nanoparticle structure–activity relationships.Electronic supplementary materialThe online version of this article (doi:10.1007/s11051-015-3051-0) contains supplementary material, which is available to authorized users.
Background: Identification of RNA homologs within genomic stretches is difficult when pairwise sequence identity is low or unalignable flanking residues are present. In both cases structuresequence or profile/family-sequence alignment programs become difficult to apply because of unreliable RNA structures or family alignments. As such, local sequence-sequence alignment programs are frequently used instead. We have recently demonstrated that maximal expected accuracy alignments using partition function match probabilities (implemented in Probalign) are significantly better than contemporary methods on heterogeneous length protein sequence datasets, thus suggesting an affinity for local alignment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.