This work discusses bioinformatics and experimental approaches to explore the human proteome, a constellation of proteins expressed in different tissues and organs. As the human proteome is not a static entity, it seems necessary to estimate the number of different protein species (proteoforms) and measure the number of copies of the same protein in a specific tissue. Here, meta-analysis of neXtProt knowledge base is proposed for theoretical prediction of the number of different proteoforms that arise from alternative splicing (AS), single amino acid polymorphisms (SAPs), and posttranslational modifications (PTMs). Three possible cases are considered: (1) PTMs and SAPs appear exclusively in the canonical sequences of proteins, but not in splice variants; (2) PTMs and SAPs can occur in both proteins encoded by canonical sequences and in splice variants; (3) all modification types (AS, SAP, and PTM) occur as independent events. Experimental validation of proteoforms is limited by the analytical sensitivity of proteomic technology. A bell-shaped distribution histogram was generated for proteins encoded by a single chromosome, with the estimation of copy numbers in plasma, liver, and HepG2 cell line. The proposed metabioinformatics approaches can be used for estimation of the number of different proteoforms for any group of protein-coding genes.
Age-related physiological changes in humans are linearly associated with age. Naturally, linear combinations of physiological measures trained to estimate chronological age have recently emerged as a practical way to quantify aging in the form of biological age. In this work, we used one-week long physical activity records from a 2003–2006 National Health and Nutrition Examination Survey (NHANES) to compare three increasingly accurate biological age models: the unsupervised Principal Components Analysis (PCA) score, a multivariate linear regression, and a state-of-the-art deep convolutional neural network (CNN). We found that the supervised approaches produce better chronological age estimations at the expense of a loss of the association between the aging acceleration and all-cause mortality. Consequently, we turned to the NHANES death register directly and introduced a novel way to train parametric proportional hazards models suitable for out-of-the-box implementation with any modern machine learning software. As a demonstration, we produced a separate deep CNN for mortality risks prediction that outperformed any of the biological age or a simple linear proportional hazards model. Altogether, our findings demonstrate the emerging potential of combined wearable sensors and deep learning technologies for applications involving continuous health risk monitoring and real-time feedback to patients and care providers.
We performed a systematic evaluation of the relationships between locomotor activity and signatures of frailty, morbidity, and mortality risks using physical activity records from the 2003-2006 National Health and Nutrition Examination Survey (NHANES) and UK BioBank (UKB). We proposed a statistical description of the locomotor activity tracks and transformed the provided time series into vectors representing physiological states for each participant. The Principal Component Analysis of the transformed data revealed a winding trajectory with distinct segments corresponding to subsequent human development stages. The extended linear phase starts from 35−40 years old and is associated with the exponential increase of mortality risks according to the Gompertz mortality law. We characterized the distance traveled along the aging trajectory as a natural measure of biological age and demonstrated its significant association with frailty and hazardous lifestyles, along with the remaining lifespan and healthspan of an individual. The biological age explained most of the variance of the log-hazard ratio that was obtained by fitting directly to mortality and the incidence of chronic diseases. Our findings highlight the intimate relationship between the supervised and unsupervised signatures of the biological age and frailty, a consequence of the low intrinsic dimensionality of the aging dynamics.
Clustered regularly interspaced short palindromic repeat (CRISPR) is a bacterial immunity system that requires a perfect sequence match between the CRISPR cassette spacer and a protospacer in invading DNA for exclusion of foreign genetic elements. CRISPR cassettes are hypervariable, possibly reflecting different exposure of strains of the same species to foreign genetic elements. Here, we determined CRISPR cassette sequences of two Xanthomonas oryzae strains and found that one of the strains remains sensitive to phage Xop411 despite carrying a cassette that has a spacer exactly matching a fragment of the Xop411 genome. To explain this apparent paradox, we identified X. oryzae CRISPR spacers of likely phage origin and defined a consensus sequence of a motif adjacent to X. oryzae phage protospacers. Our analysis revealed that the Xop411 protospacer that matches the CRISPR spacer has this motif mutated, which likely explains the phage's ability to infect its host. While similar observations were made previously with Streptococcus thermophilus and its phages, the conserved motif in X. oryzae phages is located on a protospacer side opposite to the S. thermophilus phages' motif. The results thus point to a considerable degree of variety of CRISPR-mediated phage resistance mechanisms in different bacteria.
Twenty-nine human aqueous humor samples from patients with eye diseases such as cataract and glaucoma with and without pseudoexfoliation syndrome were characterized by LC-high resolution MS analysis. In total, 269 protein groups were identified with 1% false discovery rate including 32 groups that were not reported previously for this biological fluid. Since the samples were analyzed individually, but not pooled, 36 proteins were identified in all samples, comprising the constitutive proteome of the fluid. The most dominant molecular function of aqueous humor proteins as determined by GO analysis is endopeptidase inhibitor activity. Label-free protein quantification showed no significant difference between glaucoma and cataract aqueous humor proteomes. At the same time, we found decrease in the level of apolipoprotein D as a marker of the pseudoexfoliation syndrome. The data are available from ProteomeXchange repository (PXD002623).
Elucidation of new biomarkers and potential drug targets from high-throughput profiling data is a challenging task due to a limited number of available biological samples and questionable reproducibility of differential changes in cross-dataset comparisons. In this paper we propose a novel computational approach for drug and biomarkers discovery using comprehensive analysis of multiple expression profiling datasets.The new method relies on aggregation of individual profiling experiments combined with leave-one-dataset-out validation approach. Aggregated datasets were studied using Sub-Network Enrichment Analysis algorithm (SNEA) to find consistent statistically significant key regulators within the global literature-extracted expression regulation network. These regulators were linked to the consistent differentially expressed genes.We have applied our approach to several publicly available human muscle gene expression profiling datasets related to Duchenne muscular dystrophy (DMD). In order to detect both enhanced and repressed processes we considered up- and down-regulated genes separately. Applying the proposed approach to the regulators search we discovered the disturbance in the activity of several muscle-related transcription factors (e.g. MYOG and MYOD1), regulators of inflammation, regeneration, and fibrosis. Almost all SNEA-derived regulators of down-regulated genes (e.g. AMPK, TORC2, PPARGC1A) correspond to a single common pathway important for fast-to-slow twitch fiber type transition. We hypothesize that this process can affect the severity of DMD symptoms, making corresponding regulators and downstream genes valuable candidates for being potential drug targets and exploratory biomarkers.
We collected 60 age-dependent transcriptomes for C. elegans strains including four exceptionally long-lived mutants (mean adult lifespan extended 2.2- to 9.4-fold) and three examples of lifespan-increasing RNAi treatments. Principal Component Analysis (PCA) reveals aging as a transcriptomic drift along a single direction, consistent across the vastly diverse biological conditions and coinciding with the first principal component, a hallmark of the criticality of the underlying gene regulatory network. We therefore expected that the organism’s aging state could be characterized by a single number closely related to vitality deficit or biological age. The “aging trajectory”, i.e. the dependence of the biological age on chronological age, is then a universal stochastic function modulated by the network stiffness; a macroscopic parameter reflecting the network topology and associated with the rate of aging. To corroborate this view, we used publicly available datasets to define a transcriptomic biomarker of age and observed that the rescaling of age by lifespan simultaneously brings together aging trajectories of transcription and survival curves. In accordance with the theoretical prediction, the limiting mortality value at the plateau agrees closely with the mortality rate doubling exponent estimated at the cross-over age near the average lifespan. Finally, we used the transcriptomic signature of age to identify possible life-extending drug compounds and successfully tested a handful of the top-ranking molecules in C. elegans survival assays and achieved up to a +30% extension of mean lifespan.
Proteogenomics is based on the use of customized genome or RNA sequencing databases for interrogation of shotgun proteomics data in search for proteome‐level evidence of genome variations or RNA editing. In this work, the products of adenosine‐to‐inosine RNA editing in human and murine brain proteomes are identified using publicly available brain proteome LC‐MS/MS datasets and an RNA editome database compiled from several sources. After filtering of false‐positive results, 20 and 37 sites of editing in proteins belonging to 14 and 32 genes are identified for murine and human brain proteomes, respectively. Eight sites of editing identified with high spectral counts overlapped between human and mouse brain samples. Some of these sites have been previously reported using orthogonal methods, such as α‐amino‐3‐hydroxy‐5‐methyl‐4‐isoxazolepropionic acid (AMPA) glutamate receptors, CYFIP2, coatomer alpha. Also, differential editing between neurons and microglia is demonstrated in this work for some of the proteins from primary murine brain cell cultures. Because many edited sites are still not characterized functionally at the protein level, the results provide a necessary background for their further analysis in normal and diseased cells and tissues using targeted proteomic approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.