Safecast is a volunteered geographic information (VGI) project where the lay public uses hand-held sensors to collect radiation measurements that are then made freely available under the Creative Commons CC0 license. However, Safecast data fidelity is uncertain given the sensor kits are hand assembled with various levels of technical proficiency, and the sensors may not be properly deployed. Our objective was to validate Safecast data by comparing Safecast data with authoritative data collected by the U. S. Department of Energy (DOE) and the U. S. National Nuclear Security Administration (NNSA) gathered in the Fukushima Prefecture shortly after the Daiichi nuclear power plant catastrophe. We found that the two data sets were highly correlated, though the DOE/NNSA observations were generally higher than the Safecast measurements. We concluded that this high correlation alone makes Safecast a viable data source for detecting and monitoring radiation.
Deep learning has contributed to major advances in the prediction of protein structure from sequence, a fundamental problem in structural bioinformatics. With predictions now approaching the accuracy of crystallographic resolution in some cases, and with accelerators like GPUs and TPUs making inference using large models rapid, fast genome-level structure prediction becomes an obvious aim. Leadership-class computing resources can be used to perform genome-scale protein structure prediction using state-of-the-art deep learning models, providing a wealth of new data for systems biology applications. Here we describe our efforts to efficiently deploy the AlphaFold2 program, for full-proteome structure prediction, at scale on the Oak Ridge Leadership Computing Facility's resources, including the Summit supercomputer. We performed inference to produce the predicted structures for 35,634 protein sequences, corresponding to three prokaryotic proteomes and one plant proteome, using under 4,000 total Summit node hours, equivalent to using the majority of the supercomputer for one hour. We also designed an optimized structure refinement that reduced the time for the relaxation stage of the AlphaFold pipeline by over 10X for longer sequences. We demonstrate the types of analyses that can be performed on proteome-scale collections of sequences, including a search for novel quaternary structures and implications for functional annotation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.