Professional sleep societies have identified a need for strategic research in multiple areas that may benefit from access to and aggregation of large, multidimensional datasets. Technological advances provide opportunities to extract and analyze physiological signals and other biomedical information from datasets of unprecedented size, heterogeneity, and complexity. The National Institutes of Health has implemented a Big Data to Knowledge (BD2K) initiative that aims to develop and disseminate state of the art big data access tools and analytical methods. The National Sleep Research Resource (NSRR) is a new National Heart, Lung, and Blood Institute resource designed to provide big data resources to the sleep research community. The NSRR is a web-based data portal that aggregates, harmonizes, and organizes sleep and clinical data from thousands of individuals studied as part of cohort studies or clinical trials and provides the user a suite of tools to facilitate data exploration and data visualization. Each deidentified study record minimally includes the summary results of an overnight sleep study; annotation files with scored events; the raw physiological signals from the sleep record; and available clinical and physiological data. NSRR is designed to be interoperable with other public data resources such as the Biologic Specimen and Data Repository Information Coordinating Center Demographics (BioLINCC) data and analyzed with methods provided by the Research Resource for Complex Physiological Signals (PhysioNet). This article reviews the key objectives, challenges and operational solutions to addressing big data opportunities for sleep research in the context of the national sleep research agenda. It provides information to facilitate further interactions of the user community with NSRR, a community resource.
EpSO plays a critical role in informatics tools for epilepsy patient care and multi-center clinical research.
The Cloudwave platform is a new approach to leverage of large-scale electrophysiological data for advancing multicenter clinical research.
Inconsistencies in the preparation of histology slides and whole‐slide images (WSIs) may lead to challenges with subsequent image analysis and machine learning approaches for interrogating the WSI. These variabilities are especially pronounced in multicenter cohorts, where batch effects (i.e. systematic technical artifacts unrelated to biological variability) may introduce biases to machine learning algorithms. To date, manual quality control (QC) has been the de facto standard for dataset curation, but remains highly subjective and is too laborious in light of the increasing scale of tissue slide digitization efforts. This study aimed to evaluate a computer‐aided QC pipeline for facilitating a reproducible QC process of WSI datasets. An open source tool, HistoQC, was employed to identify image artifacts and compute quantitative metrics describing visual attributes of WSIs to the Nephrotic Syndrome Study Network (NEPTUNE) digital pathology repository. A comparison in inter‐reader concordance between HistoQC aided and unaided curation was performed to quantify improvements in curation reproducibility. HistoQC metrics were additionally employed to quantify the presence of batch effects within NEPTUNE WSIs. Of the 1814 WSIs (458 H&E, 470 PAS, 438 silver, 448 trichrome) from n = 512 cases considered in this study, approximately 9% (163) were identified as unsuitable for subsequent computational analysis. The concordance in the identification of these WSIs among computational pathologists rose from moderate (Gwet's AC1 range 0.43 to 0.59 across stains) to excellent (Gwet's AC1 range 0.79 to 0.93 across stains) agreement when aided by HistoQC. Furthermore, statistically significant batch effects (p < 0.001) in the NEPTUNE WSI dataset were discovered. Taken together, our findings strongly suggest that quantitative QC is a necessary step in the curation of digital pathology cohorts. © 2020 The Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
Data-driven neuroscience research is providing new insights in progression of neurological disorders and supporting the development of improved treatment approaches. However, the volume, velocity, and variety of neuroscience data generated from sophisticated recording instruments and acquisition methods have exacerbated the limited scalability of existing neuroinformatics tools. This makes it difficult for neuroscience researchers to effectively leverage the growing multi-modal neuroscience data to advance research in serious neurological disorders, such as epilepsy. We describe the development of the Cloudwave data flow that uses new data partitioning techniques to store and analyze electrophysiological signal in distributed computing infrastructure. The Cloudwave data flow uses MapReduce parallel programming algorithm to implement an integrated signal data processing pipeline that scales with large volume of data generated at high velocity. Using an epilepsy domain ontology together with an epilepsy focused extensible data representation format called Cloudwave Signal Format (CSF), the data flow addresses the challenge of data heterogeneity and is interoperable with existing neuroinformatics data representation formats, such as HDF5. The scalability of the Cloudwave data flow is evaluated using a 30-node cluster installed with the open source Hadoop software stack. The results demonstrate that the Cloudwave data flow can process increasing volume of signal data by leveraging Hadoop Data Nodes to reduce the total data processing time. The Cloudwave data flow is a template for developing highly scalable neuroscience data processing pipelines using MapReduce algorithms to support a variety of user applications.
Kidney fibrosis constitutes the shared final pathway of nearly all chronic nephropathies, but biomarkers for the non-invasive assessment of kidney fibrosis are currently not available. To address this, we characterize five candidate biomarkers of kidney fibrosis: Cadherin-11 (CDH11), Sparc-related modular calcium binding protein-2 (SMOC2), Pigment epithelium-derived factor (PEDF), Matrix-Gla protein, and Thrombospondin-2. Gene expression profiles in single-cell and single-nucleus RNA-sequencing (sc/snRNA-seq) datasets from rodent models of fibrosis and human chronic kidney disease (CKD) were explored, and Luminex-based assays for each biomarker were developed. Plasma and urine biomarker levels were measured using independent prospective cohorts of CKD: the Boston Kidney Biopsy Cohort, a cohort of individuals with biopsyconfirmed semiquantitative assessment of kidney fibrosis, and the Seattle Kidney Study, a cohort of patients with common forms of CKD. Ordinal logistic regression and Cox proportional hazards regression models were used to test associations of biomarkers with interstitial fibrosis and tubular atrophy and progression to end-stage kidney disease and death, respectively. Sc/snRNA-seq data confirmed cell-specific expression of biomarker genes in fibroblasts. After multivariable adjustment, higher levels of plasma CDH11, SMOC2, and PEDF and urinary CDH11 and PEDF were significantly associated with increasing severity of interstitial fibrosis and tubular atrophy in the Boston Kidney Biopsy Cohort. In both cohorts, higher levels of plasma and urinary SMOC2 and urinary CDH11 were independently associated with progression to end-stage kidney disease. Higher levels of urinary PEDF associated with end-stage kidney disease in the Seattle Kidney Study, with a similar signal in the Boston Kidney Biopsy Cohort, although the latter narrowly missed statistical significance. Thus, we identified CDH11, SMOC2, and PEDF as promising non-invasive biomarkers of kidney fibrosis.
Abstract. The increasing capability and sophistication of biomedical instruments has led to rapid generation of large volumes of disparate data that is often characterized as biomedical "big data". Effective analysis of biomedical big data is providing new insights to advance healthcare research, but it is difficult to efficiently manage big data without a conceptual model, such as ontology, to support storage, query, and analytical functions. In this paper, we describe the Cloudwave platform that uses a domain ontology to support optimal data partitioning, efficient network transfer, visualization, and querying of big data in the neurology disease domain. The domain ontology is used to define a new JSONbased Cloudwave Signal Format (CSF) for neurology signal data. A comparative evaluation of the ontology-based CSF with existing data format demonstrates that it significantly reduces the data access time for query and visualization of large scale signal data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.