BackgroundThe human immunodeficiency virus type 1 (HIV-1) aspartic protease is an important enzyme owing to its imperative part in viral development and a causative agent of deadliest disease known as acquired immune deficiency syndrome (AIDS). Development of HIV-1 protease inhibitors can help understand the specificity of substrates which can restrain the replication of HIV-1, thus antagonize AIDS. However, experimental methods in identification of HIV-1 protease cleavage sites are generally time-consuming and labor-intensive. Therefore, using computational methods to predict cleavage sites has become highly desirable.ResultsIn this study, we propose a prediction method in which sequence, structural, and physicochemical features are incorporated in various machine learning algorithms. Then, a bidirectional stepwise selection algorithm is incorporated in feature selection to identify discriminative features. Further, only the selected features are calculated by various encoding schemes and used as input for decision trees, logistic regression, and artificial neural networks. Moreover, a more rigorous three-way data split procedure is applied to evaluate the objective performance of cleavage site prediction. Four benchmark datasets collected from previous studies are used to evaluate the predictive performance.ConclusionsExperiment results showed that combinations of sequence, structure, and physicochemical features performed better than single feature type for identification of HIV-1 protease cleavage sites. In addition, incorporation of stepwise feature selection is effective to identify interpretable biological features to depict specificity of the substrates. Moreover, artificial neural networks perform significantly better than the other two classifiers. Finally, the proposed method achieved 80.0% ~ 97.4% in accuracy and 0.815 ~ 0.995 evaluated by independent test sets in a three-way data split procedure.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1337-6) contains supplementary material, which is available to authorized users.
Background Preeclampsia and intrauterine growth restriction are placental dysfunction–related disorders (PDDs) that require a referral decision be made within a certain time period. An appropriate prediction model should be developed for these diseases. However, previous models did not demonstrate robust performances and/or they were developed from datasets with highly imbalanced classes. Objective In this study, we developed a predictive model of PDDs by machine learning that uses features at 24-37 weeks’ gestation, including maternal characteristics, uterine artery (UtA) Doppler measures, soluble fms-like tyrosine kinase receptor-1 (sFlt-1), and placental growth factor (PlGF). Methods A public dataset was taken from a prospective cohort study that included pregnant women with PDDs (66/95, 69%) and a control group (29/95, 31%). Preliminary selection of features was based on a statistical analysis using SAS 9.4 (SAS Institute). We used Weka (Waikato Environment for Knowledge Analysis) 3.8.3 (The University of Waikato, Hamilton, NZ) to automatically select the best model using its optimization algorithm. We also manually selected the best of 23 white-box models. Models, including those from recent studies, were also compared by interval estimation of evaluation metrics. We used the Matthew correlation coefficient (MCC) as the main metric. It is not overoptimistic to evaluate the performance of a prediction model developed from a dataset with a class imbalance. Repeated 10-fold cross-validation was applied. Results The classification via regression model was chosen as the best model. Our model had a robust MCC (.93, 95% CI .87-1.00, vs .64, 95% CI .57-.71) and specificity (100%, 95% CI 100-100, vs 90%, 95% CI 90-90) compared to each metric of the best models from recent studies. The sensitivity of this model was not inferior (95%, 95% CI 91-100, vs 100%, 95% CI 92-100). The area under the receiver operating characteristic curve was also competitive (0.970, 95% CI 0.966-0.974, vs 0.987, 95% CI 0.980-0.994). Features in the best model were maternal weight, BMI, pulsatility index of the UtA, sFlt-1, and PlGF. The most important feature was the sFlt-1/PlGF ratio. This model used an M5P algorithm consisting of a decision tree and four linear models with different thresholds. Our study was also better than the best ones among recent studies in terms of the class balance and the size of the case class (66/95, 69%, vs 27/239, 11.3%). Conclusions Our model had a robust predictive performance. It was also developed to deal with the problem of a class imbalance. In the context of clinical management, this model may improve maternal mortality and neonatal morbidity and reduce health care costs.
Background: The prevalence of nonalcoholic fatty liver disease is increasing over time worldwide, with similar trends to those of diabetes and obesity. A liver biopsy, the gold standard of diagnosis, is not favored due to its invasiveness. Meanwhile, noninvasive evaluation methods of fatty liver are still either very expensive or demonstrate poor diagnostic performances, thus, limiting their applications. We developed neural network–based models to assess fatty liver and classify the severity using B-mode ultrasound (US) images. Methods: We followed standards for reporting of diagnostic accuracy guidelines to report this study. In this retrospective study, we utilized B-mode US images from a consecutive series of patients to develop four-class, two-class, and three-class diagnostic prediction models. The images were eligible if confirmed by at least two gastroenterologists. We compared pretrained convolutional neural network models, consisting of visual geometry group (VGG)19, ResNet-50 v2, MobileNet v2, Xception, and Inception v2. For validation, we utilized 20% of the dataset resulting in >100 images for each severity category. Results: There were 21,855 images from 2,070 patients classified as normal (N = 11,307), mild (N = 4,467), moderate (N = 3,155), or severe steatosis (N = 2,926). We used ResNet-50 v2 for the final model as the best ones. The areas under the receiver operating characteristic curves were 0.974 (mild steatosis vs others), 0.971 (moderate steatosis vs others), 0.981 (severe steatosis vs others), 0.985 (any severity vs normal), and 0.996 (moderate-to-severe steatosis/clinically abnormal vs normal-to-mild steatosis/clinically normal). Conclusion: Our deep learning models achieved comparable predictive performances to the most accurate, yet expensive, noninvasive diagnostic methods for fatty liver. Because of the discriminative ability, including for mild steatosis, significant impacts on clinical applications for fatty liver are expected. However, we need to overcome machine-dependent variation, motion artifacts, lacking of second confirmation from any other tools, and hospital-dependent regional bias.
Public health agencies have suggested nonpharmaceutical interventions to curb the spread of the COVID-19 infections. The study intended to explore the information-seeking behavior and information needs on preventive measures for COVID-19 in the Philippine context. The search interests and related queries for COVID-19 terms and each of the preventive measures for the period from December 31, 2019 to April 6, 2020 were generated from Google Trends. The search terms employed for COVID-19 were coronavirus, ncov, covid-19, covid19 and “covid 19.” The search terms of the preventive measures considered for this study included “community quarantine”, “cough etiquette”, “face mask” or facemask, “hand sanitizer”, handwashing or “hand washing” and “social distancing.” Spearman’s correlation was employed between the new daily COVID-19 cases, COVID-19 terms and the different preventive measures. The relative search volume for the coronavirus disease showed an increase up to the pronouncement of the country’s first case of COVID-19. An uptrend was also evident after the country’s first local transmission was confirmed. A strong positive correlation (rs = .788, p < .001) was observed between the new daily cases and search interests for COVID-19. The search interests for the different measures and the new daily cases were also positively correlated. Similarly, the search interests for the different measures and the COVID-19 terms were all positively correlated. The search interests for “face mask” or facemask, “hand sanitizer” and handwashing or “hand washing” were more correlated with the search interest for COVID-19 than with the number of new daily COVID-19 cases. The search interests for “cough etiquette”, “social distancing” and “community quarantine” were more correlated with the number of new daily COVID-19 cases than with the search interest for COVID-19. The public sought for additional details such as type, directions for proper use, and where to purchase as well as do-it-yourself alternatives for personal protective items. Personal protective or community measures were expected to be accompanied with definitions and guidelines as well as be available in translated versions. Google Trends could be a viable option to monitor and address the information needs of the public during a disease outbreak. Capturing and analyzing the search interests of the public could support the design and timely delivery of appropriate information essential to drive preventive measures during a disease outbreak.
We aimed to provide a framework that organizes internal properties of a convolutional neural network (CNN) model using non-image data to be interpretable by human. The interface was represented as ontology map and network respectively by dimensional reduction and hierarchical clustering techniques. The applicability is to implement a prediction model either to classify categorical or to estimate numerical outcome, including but not limited to that using data from electronic health records. This pipeline harnesses invention of CNN algorithms for non-image data while improving the depth of interpretability by data-driven ontology. However, the DI-VNN is only for exploration beyond its predictive ability, which requires further explanatory studies, and needs a human user with specific competences in medicine, statistics, and machine learning to explore the DI-VNN with high confidence. The key stages consisted of data preprocessing, differential analysis, feature mapping, network architecture construction, model training and validation, and exploratory analysis.
We aimed to provide a resampling protocol for dimensional reduction resulting a few latent variables. The applicability focuses on but not limited for developing a machine learning prediction model in order to improve the number of sample size in relative to the number of candidate predictors. By this feature representation technique, one can improve generalization by preventing latent variables to overfit data used to conduct the dimensional reduction. However, this technique may warrant more computational capacity and time to conduct the procedure. The key stages consisted of derivation of latent variables from multiple resampling subsets, parameter estimation of latent variables in population, and selection of latent variables transformed by the estimated parameters.
This protocol aimed to describe data transformation procedure of medical histories from electronic health records (EHRs) to historical rates by Kaplan-Meier (KM) estimation. The applicability is to extract features from real-world, time-varying data of EHRs, for developing but not limited to a machine learning prediction model. By this extraction technique, machine can learn medical history of a condition in each healthcare provider, as a differential quantity through time in term of affecting a future health state, without a need to access EHRs of other healthcare providers. However, this protocol needs a sufficient amount of longitudinal data from the most subjects in EHRs. The key stages consisted of time interval computation, historical rate derivation, and data transformation into historical rates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.