Diabetes is a diverse and complex disease, with considerable variation in phenotypic manifestation and severity. This variation hampers the study of etiological differences and reduces the statistical power of analyses of associations to genetics, treatment outcomes, and complications. We address these issues through deep, fine-grained phenotypic stratification of a diabetes cohort. Text mining the electronic health records of 14,017 patients, we matched two controlled vocabularies (ICD-10 and a custom vocabulary developed at the clinical center Steno Diabetes Center Copenhagen) to clinical narratives spanning a 19 year period. The two matched vocabularies comprise over 20,000 medical terms describing symptoms, other diagnoses, and lifestyle factors. The cohort is genetically homogeneous (Caucasian diabetes patients from Denmark) so the resulting stratification is not driven by ethnic differences, but rather by inherently dissimilar progression patterns and lifestyle related risk factors. Using unsupervised Markov clustering, we defined 71 clusters of at least 50 individuals within the diabetes spectrum. The clusters display both distinct and shared longitudinal glycemic dysregulation patterns, temporal co-occurrences of comorbidities, and associations to single nucleotide polymorphisms in or near genes relevant for diabetes comorbidities.
Gene copy-number changes influence phenotypes through gene-dosage alteration and subsequent changes of protein complex stoichiometry. Human trisomies where gene copy numbers are increased uniformly over entire chromosomes provide generic cases for studying these relationships. In most trisomies, gene and protein level alterations have fatal consequences. We used genome-wide protein-protein interaction data to identify chromosome-specific patterns of protein interactions. We found that some chromosomes encode proteins that interact infrequently with each other, chromosome 21 in particular. We combined the protein interaction data with transcriptome data from human brain tissue to investigate how this pattern of global interactions may affect cellular function. We identified highly connected proteins that also had coordinated gene expression. These proteins were associated with important neurological functions affecting the characteristic phenotypes for Down syndrome and have previously been validated in mouse knockout experiments. Our approach is general and applicable to other gene-dosage changes, such as arm-level amplifications in cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.