Protein phosphorylation is a key mechanism to regulate protein functions. However, the contribution of this protein modification to species divergence is still largely unknown. Here, we studied the evolution of mammalian phosphoregulation by comparing the human and mouse phosphoproteomes. We found that 84% of the positions that are phosphorylated in one species or the other are conserved at the residue level. Twenty percent of these conserved sites are phosphorylated in both species. This proportion is 2.5 times more than expected by chance alone, suggesting that purifying selection is preserving phosphoregulation. However, we show that the majority of the sites that are conserved at the residue level are differentially phosphorylated between species. These sites likely result from false-negative identifications due to incomplete experimental coverage, false-positive identifications and non-functional sites. In addition, our results suggest that at least 5% of them are likely to be true differentially phosphorylated sites and may thus contribute to the divergence in phosphorylation networks between mouse and humans and this, despite residue conservation between orthologous proteins. We also showed that evolutionary turnover of phosphosites at adjacent positions (in a distance range of up to 40 amino acids) in human or mouse leads to an over estimation of the divergence in phosphoregulation between these two species. These sites tend to be phosphorylated by the same kinases, supporting the hypothesis that they are functionally redundant. Our results support the hypothesis that the evolutionary turnover of phosphorylation sites contributes to the divergence in phosphorylation profiles while preserving phosphoregulation. Overall, our study provides advanced analyses of mammalian phosphoproteomes and a framework for the study of their contribution to phenotypic evolution.
Triple negative breast cancer (TNBC) is one of the most aggressive form of breast cancer (BC) with the highest mortality due to high rate of relapse, resistance, and lack of an effective treatment. Various molecular approaches have been used to target TNBC but with little success. Here, using machine learning algorithms, we analyzed the available BC data from the Cancer Genome Atlas Network (TCGA) and have identified two potential genes, TBC1D9 (TBC1 domain family member 9) and MFGE8 (Milk Fat Globule-EGF Factor 8 Protein), that could successfully differentiate TNBC from non-TNBC, irrespective of their heterogeneity. TBC1D9 is under-expressed in TNBC as compared to non-TNBC patients, while MFGE8 is over-expressed. Overexpression of TBC1D9 has a better prognosis whereas overexpression of MFGE8 correlates with a poor prognosis. Protein-protein interaction analysis by affinity purification mass spectrometry (AP-MS) and proximity biotinylation (BioID) experiments identified a role for TBC1D9 in maintaining cellular integrity, whereas MFGE8 would be involved in various tumor survival processes. These promising genes could serve as biomarkers for TNBC and deserve further investigation as they have the potential to be developed as therapeutic targets for TNBC. Triple negative breast cancer (TNBC) accounts for 10-20% of all breast cancers (BC). They are characterized by lack of the hormonal receptors estrogen (ER) and progesterone (PR), and the overexpression of human epidermal growth factor receptor 2 (HER2) 1. It is the most aggressive form of BC and is very heterogeneous 2. The complexity of TNBC increases due to its high risk of relapse, and poor progression-free survival (PFS) and overall survival (OS) 3. The PFS for metastatic TNBC patients is 3-4 months after treatment failure 4. The 5-year mortality rate for early stage TNBC after surgery is 37%, whereas half of them relapse 5. According to gene expression pattern, TNBC has been classified in 6 different molecular subtypes namely Basal like (BL)1, BL2, Luminal androgen receptor (LAR), Immunomodulatory (IM), Mesenchymal (M) and Mesenchymal stem like (MSL), with some that cannot be classified 6. Lehmann et al., 2011, have shown that each of these subgroups can be further divided into intrinsic subtypes of BC (Luminal A, Luminal B, HER2, normal breast like, Basal like and unclassified) based on their gene expression 6. This stipulates why TNBC has different clinicopathological outcomes for different patients, rendering treatment arduous. On March 8, 2019, FDA approved immunotherapy Atezolizumab (targeting PD-L1) in combination with chemotherapy (nab-paclitaxel) for initial treatment of women with advanced TNBC positive for PD-L1 protein expression 7,8. Nevertheless, there is no FDA approved target therapy for TNBC patients as a whole so far 9. TNBC heterogeneity and aggressiveness call for an unmet need to identify genes that could serve as biomarkers to differentiate TNBC from other BCs, as well as serve as potential targets therapy irrespective of their heterogene...
Motivation Breakthroughs in high-throughput technologies and machine learning methods have enabled the shift towards multi-omics modelling as the preferred means to understand the mechanisms underlying biological processes. Machine learning enables and improves complex disease prognosis in clinical settings. However, most multi-omic studies primarily use transcriptomics and epigenomics due to their over-representation in databases and their early technical maturity compared to others omics. For complex phenotypes and mechanisms, not leveraging all the omics despite their varying degree of availability can lead to a failure to understand the underlying biological mechanisms and leads to less robust classifications and predictions. Results We proposed MOT (Multi-Omic Transformer), a deep learning based model using the transformer architecture, that discriminates complex phenotypes (herein cancer types) based on five omics data types: transcriptomics (mRNA and miRNA), epigenomics (DNA methylation), copy number variations (CNVs), and proteomics. This model achieves an F1-score of $98.37\%$ among 33 tumour types on a test set without missing omics views and an F1-score of $96.74\%$ on a test set with missing omics views. It also identifies the required omic type for the best prediction for each phenotype and therefore could guide clinical decision-making when acquiring data to confirm a diagnostic. The newly introduced model can integrate and analyze five or more omics data types even with missing omics views and can also identify the essential omics data for the tumour multiclass classification tasks. It confirms the importance of each omic view. Combined, omics views allow a better differentiation rate between most cancer diseases. Our study emphasized the importance of multi-omic data to obtain a better multiclass cancer classification. Availability and implementation: MOT source code is available at \url{https://github.com/dizam92/multiomic_predictions}.
Machine learning (ML) algorithms may help better understand the complex interactions among factors that influence dietary choices and behaviors. The aim of this study was to explore whether ML algorithms are more accurate than traditional statistical models in predicting vegetable and fruit (VF) consumption. A large array of features (2,452 features from 525 variables) encompassing individual and environmental information related to dietary habits and food choices in a sample of 1,147 French-speaking adult men and women was used for the purpose of this study. Adequate VF consumption, which was defined as 5 servings/d or more, was measured by averaging data from three web-based 24 h recalls and used as the outcome to predict. Nine classification ML algorithms were compared to two traditional statistical predictive models, logistic regression and penalized regression (Lasso). The performance of the predictive ML algorithms was tested after the implementation of adjustments, including normalizing the data, as well as in a series of sensitivity analyses such as using VF consumption obtained from a web-based food frequency questionnaire (wFFQ) and applying a feature selection algorithm in an attempt to reduce overfitting. Logistic regression and Lasso predicted adequate VF consumption with an accuracy of 0.64 (95% confidence interval [CI]: 0.58–0.70) and 0.64 (95%CI: 0.60–0.68) respectively. Among the ML algorithms tested, the most accurate algorithms to predict adequate VF consumption were the support vector machine (SVM) with either a radial basis kernel or a sigmoid kernel, both with an accuracy of 0.65 (95%CI: 0.59–0.71). The least accurate ML algorithm was the SVM with a linear kernel with an accuracy of 0.55 (95%CI: 0.49–0.61). Using dietary intake data from the wFFQ and applying a feature selection algorithm had little to no impact on the performance of the algorithms. In summary, ML algorithms and traditional statistical models predicted adequate VF consumption with similar accuracies among adults. These results suggest that additional research is needed to explore further the true potential of ML in predicting dietary behaviours that are determined by complex interactions among several individual, social and environmental factors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.