Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing.
Background
Surface plasmon resonance is a label-free biophysical technique that is widely used in investigating biomolecular interactions, including protein-protein, protein-DNA, and protein-small molecule binding. Surface plasmon resonance is a very powerful tool in different stages of small molecule drug development and antibody characterization. Both academic institutions and pharmaceutical industry extensively utilize this method for screening and validation studies involving direct molecular interactions. In most applications of the surface plasmon resonance technology, one of the studied molecules is immobilized on a microchip, while the second molecule is delivered through a microfluidic system over the immobilized molecules. Changes in total mass on the chip surface is recorded in real time as an indicator of the molecular interactions.
Main body
Quality and accuracy of the surface plasmon resonance data depend on experimental variables, including buffer composition, type of sensor chip, coupling chemistry of molecules on the sensor surface, and surface regeneration conditions. These technical details are generally included in materials and methods sections of published manuscripts and are not easily accessible using the common internet browser search engines or PubMed. Herein, we introduce a surface plasmon resonance database, www.sprdatabase.info that contains technical details extracted from 5140 publications with surface plasmon resonance data. We also provide an analysis of experimental conditions preferred by different laboratories. These experimental variables can be searched within the database and help future users of this technology to design better experiments.
Conclusion
Amine coupling and CM5 chips were the most common methods used for immobilizing proteins in surface plasmon resonance experiments. However, number of different chips, capture methods and buffer conditions were used by multiple investigators. We predict that the database will significantly help the scientific community using this technology and hope that users will provide feedback to improve and expand the database indefinitely. Publicly available information in the database can save a great amount of time and resources by assisting initial optimization and troubleshooting of surface plasmon resonance experiments.
Malignancy of the brain and CNS is unfortunately a common diagnosis. A large subset of these lesions tends to be high grade tumors which portend poor prognoses and low survival rates, and are estimated to be the tenth leading cause of death worldwide. The complex nature of the brain tissue environment in which these lesions arise offers a rich opportunity for translational research. Magnetic Resonance Imaging (MRI) can provide a comprehensive view of the abnormal regions in the brain, therefore, its applications in the translational brain cancer research is considered essential for the diagnosis and monitoring of disease. Recent years has seen rapid growth in the field of radiogenomics, especially in cancer, and scientists have been able to successfully integrate the quantitative data extracted from medical images (also known as radiomics) with genomics to answer new and clinically relevant questions. In this paper, we took raw MRI scans from the REMBRANDT data collection from public domain, and performed volumetric segmentation to identify subregions of the brain. Radiomic features were then extracted to represent the MRIs in a quantitative yet summarized format. This resulting dataset now enables further biomedical and integrative data analysis, and is being made public via the NeuroImaging Tools & Resources Collaboratory (NITRC) repository (https://www.nitrc.org/projects/rembrandt_brain/).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.