Purpose To perform an in‐depth evaluation of current state of the art techniques in training neural networks to identify appropriate approaches in small datasets. Method In total, 112,120 frontal‐view X‐ray images from the NIH ChestXray14 dataset were used in our analysis. Two tasks were studied: unbalanced multi‐label classification of 14 diseases, and binary classification of pneumonia vs non‐pneumonia. All datasets were randomly split into training, validation, and testing (70%, 10%, and 20%). Two popular convolution neural networks (CNNs), DensNet121 and ResNet50, were trained using PyTorch. We performed several experiments to test: (a) whether transfer learning using pretrained networks on ImageNet are of value to medical imaging/physics tasks (e.g., predicting toxicity from radiographic images after training on images from the internet), (b) whether using pretrained networks trained on problems that are similar to the target task helps transfer learning (e.g., using X‐ray pretrained networks for X‐ray target tasks), (c) whether freeze deep layers or change all weights provides an optimal transfer learning strategy, (d) the best strategy for the learning rate policy, and (e) what quantity of data is needed in order to appropriately deploy these various strategies (N = 50 to N = 77 880). Results In the multi‐label problem, DensNet121 needed at least 1600 patients to be comparable to, and 10 000 to outperform, radiomics‐based logistic regression. In classifying pneumonia vs non‐pneumonia, both CNN and radiomics‐based methods performed poorly when N < 2000. For small datasets ( < 2000), however, a significant boost in performance (>15% increase on AUC) comes from a good selection of the transfer learning dataset, dropout, cycling learning rate, and freezing and unfreezing of deep layers as training progresses. In contrast, if sufficient data are available (>35 000), little or no tweaking is needed to obtain impressive performance. While transfer learning using X‐ray images from other anatomical sites improves performance, we also observed a similar boost by using pretrained networks from ImageNet. Having source images from the same anatomical site, however, outperforms every other methodology, by up to 15%. In this case, DL models can be trained with as little as N = 50. Conclusions While training DL models in small datasets (N < 2000) is challenging, no tweaking is necessary for bigger datasets (N > 35 000). Using transfer learning with images from the same anatomical site can yield remarkable performance in new tasks with as few as N = 50. Surprisingly, we did not find any advantage for using images from other anatomical sites over networks that have been trained using ImageNet. This indicates that features learned may not be as general as currently believed, and performance decays rapidly even by just changing the anatomical site of the images.
The 6th CAPRI edition included new modelling challenges, such as the prediction of protein-peptide complexes, and the modelling of homo-oligomers and domain-domain interactions as part of the first joint CASP-CAPRI experiment. Other non-standard targets included the prediction of interfacial water positions and the modelling of the interactions between proteins and nucleic acids. We have participated in all proposed targets of this CAPRI edition both as predictors and as scorers, with new protocols to efficiently use our docking and scoring scheme pyDock in a large variety of scenarios. In addition, we have participated for the first time in the server section, with our recently developed webserver, pyDockWeb. Excluding the CASP-CAPRI cases, we submitted acceptable models (or better) for 7 out of the 18 evaluated targets as predictors, 4 out of the 11 targets as scorers, and 6 out of the 18 targets as servers. The overall success rates were below those in past CAPRI editions. This shows the challenging nature of this last edition, with many difficult targets for which no participant submitted a single acceptable model. Interestingly, we submitted acceptable models for 83% of the evaluated protein-peptide targets. As for the 25 cases of the CASP-CAPRI experiment, in which we used a larger variety of modelling techniques (template-based, symmetry restraints, literature information, etc.), we submitted acceptable models for 56% of the targets. In summary, this CAPRI edition showed that pyDock scheme can be efficiently adapted to the increasing variety of problems that the protein interactions field is currently facing.3
Aims: The present work aims to predict sensory astringency from wine chemical composition using machine learning algorithms.Material and results: Moristel grapes from different vineblocks and at different stages of ripening were collected. Eleven different wines were produced in 75 L tanks in triplicate, and further sensory factors were described by the rate-all-that-apply method with a trained panel of participants. The polyphenolic composition was characterised in wines by measuring the concentration and activity of tannins using UHPLC-UV/VIS, the mean degree of polymerisation (mDP. and the composition of tannins using thiolysis followed by UHPLC-MS. Conventional oenological parameters were analysed using FTIR and UV-Vis. Machine learning was applied to build models for predicting a wines astringency from its chemical composition. The best model was obtained using the support vector regressor (radial kernel) algorithm presenting a root-mean-square error (RMSE) value of 0.190.Conclusions: The main variables of the astringency model were the % of procyanidins constituting tannins and ethanol content, followed by other eight variables related to tannin structure and acidity.Significance of the study: These results increase the knowledge of chemical variables related to the perception of wine astringency and provide tools to control and optimise grape and wine production stages to modulate astringency and maximise quality and the consumer appeal of wines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.