Abstract:Multi-label classification has attracted an increasing amount of attention in recent years. To this end, many algorithms have been developed to classify multi-label data in an effective manner. However, they usually do not consider the pairwise relations indicated by sample labels, which actually play important roles in multi-label classification. Inspired by this, we naturally extend the traditional pairwise constraints to the multi-label scenario via a flexible thresholding scheme. Moreover, to improve the generalization ability of the classifier, we adopt a boosting-like strategy to construct a multi-label ensemble from a group of base classifiers. To achieve these goals, this paper presents a novel multi-label classification framework named Variable Pairwise Constraint projection for Multi-label Ensemble (VPCME). Specifically, we take advantage of the variable pairwise constraint projection to learn a lower-dimensional data representation, which preserves the correlations between samples and labels. Thereafter, the base classifiers are trained in the new data space. For the boosting-like strategy, we employ both the variable pairwise constraints and the bootstrap steps to diversify the base classifiers. Empirical studies have shown the superiority of the proposed method in comparison with other approaches.
Traditional methodologies for assessing chemical toxicity are expensive and time-consuming. Computational modeling approaches have emerged as low-cost alternatives, especially those used to develop quantitative structure–activity relationship (QSAR) models. However, conventional QSAR models have limited training data, leading to low predictivity for new compounds. We developed a data-driven modeling approach for constructing carcinogenicity-related models and used these models to identify potential new human carcinogens. To this goal, we used a probe carcinogen dataset from the US Environmental Protection Agency’s Integrated Risk Information System (IRIS) to identify relevant PubChem bioassays. Responses of 25 PubChem assays were significantly relevant to carcinogenicity. Eight assays inferred carcinogenicity predictivity and were selected for QSAR model training. Using 5 machine learning algorithms and 3 types of chemical fingerprints, 15 QSAR models were developed for each PubChem assay dataset. These models showed acceptable predictivity during 5-fold cross-validation (average CCR = 0.71). Using our QSAR models, we can correctly predict and rank 342 IRIS compounds’ carcinogenic potentials (PPV = 0.72). The models predicted potential new carcinogens, which were validated by a literature search. This study portends an automated technique that can be applied to prioritize potential toxicants using validated QSAR models based on extensive training sets from public data resources.
Background: Early detection of heart failure is the basis for better medical treatment and prognosis. Over the last decades, both prevalence and incidence rates of heart failure have increased worldwide, resulting in a significant global public health issue. However, an early diagnosis is not an easy task because symptoms of heart failure are usually non-specific. Therefore, this study aims to develop a risk prediction model for incident heart failure through a machine learning-based predictive model. Although African Americans have a higher risk of incident heart failure among all populations, few studies have developed a heart failure risk prediction model for African Americans. Methods: This research implemented the Least Absolute Shrinkage and Selection Operator (LASSO) logistic regression, support vector machine, random forest, and Extreme Gradient Boosting (XGBoost) to establish the Jackson Heart Study’s predictive model. In the analysis of real data, missing data are problematic when building a predictive model. Here, we evaluate predictors’ inclusion with various missing rates and different missing imputation strategies to discover the optimal analytics. Results: According to hundreds of models that we examined, the best predictive model was the XGBoost that included variables with a missing rate of less than 30 percent, and we imputed missing values by non-parametric random forest imputation. The optimal XGBoost machine demonstrated an Area Under Curve (AUC) of 0.8409 to predict heart failure for the Jackson Heart Study. Conclusion: This research identifies variations of diabetes medication as the most crucial risk factor for heart failure compared to the complete cases approach that failed to discover this phenomenon.
We investigate the photonic bandgaps in graphene-pair arrays. Graphene sheets are installed in a bulk substrate to form periodical graphene photonic crystal. The compound system approves a photonic band structure as a light impinges on it. Multiple stopbands are induced by changing the incident frequency of light. The stopbands widths and their central frequencies could be modulated through the graphene chemical potential. The number of stopbands decreases with the increase in the spatial period of graphene pairs. Otherwise, two full passbands are realized in the parameter space composed of the incident angle and the light frequency. This investigation has potentials applied in tunable multi-stopbands filters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.