Abstract:Colon adenocarcinoma (COAD) is the most common histologic subtype of colorectal cancer (CRC), and its prognosis is poor. Unlike traditional research in molecular biology, which is limited to analyzing the function of a single gene or protein in malignant tumors. The Weighted gene correlation network analysis (WGCNA) technique is used to describe the gene association model among different samples in order to identify highly collaborative genes. In this study, a computational strategy was used to conduct a syste… Show more
“…For example, in Fig. 10 A, C4orf19 (chromosome 4 open reading frame 19) is broadly expressed across human cell types and tissues [ 42 ], with high protein levels in the kidney, liver, and GI tract [ 43 , 44 ] and while little is known about its function, an observed relationship between C4orf19 and colorectal cancer suggests that high expression levels might have some value as a marker for colorectal cancer [ 45 ], although elevated C4orf19 expression is also reported to show a favorable association with renal cancer survival [ 43 , 44 ]. Notably, four of the other proteins in this cluster (PDCD10, STK24, STK25, STK26) are known to associate into a complex with roles in maintaining epithelial integrity [ 46 , 47 ] and kidney water balance by regulating aquaporin trafficking and abundance in kidney tubule epithelial cells [ 48 ], suggesting a potential role for C4orf19 in normal kidney function.…”
Background
Proteins often assemble into higher-order complexes to perform their biological functions. Such protein–protein interactions (PPI) are often experimentally measured for pairs of proteins and summarized in a weighted PPI network, to which community detection algorithms can be applied to define the various higher-order protein complexes. Current methods include unsupervised and supervised approaches, often assuming that protein complexes manifest only as dense subgraphs. Utilizing supervised approaches, the focus is not on how to find them in a network, but only on learning which subgraphs correspond to complexes, currently solved using heuristics. However, learning to walk trajectories on a network to identify protein complexes leads naturally to a reinforcement learning (RL) approach, a strategy not extensively explored for community detection. Here, we develop and evaluate a reinforcement learning pipeline for community detection on weighted protein–protein interaction networks to detect new protein complexes. The algorithm is trained to calculate the value of different subgraphs encountered while walking on the network to reconstruct known complexes. A distributed prediction algorithm then scales the RL pipeline to search for novel protein complexes on large PPI networks.
Results
The reinforcement learning pipeline is applied to a human PPI network consisting of 8k proteins and 60k PPI, which results in 1,157 protein complexes. The method demonstrated competitive accuracy with improved speed compared to previous algorithms. We highlight protein complexes such as C4orf19, C18orf21, and KIAA1522 which are currently minimally characterized. Additionally, the results suggest TMC04 be a putative additional subunit of the KICSTOR complex and confirm the involvement of C15orf41 in a higher-order complex with HIRA, CDAN1, ASF1A, and by 3D structural modeling.
Conclusions
Reinforcement learning offers several distinct advantages for community detection, including scalability and knowledge of the walk trajectories defining those communities. Applied to currently available human protein interaction networks, this method had comparable accuracy with other algorithms and notable savings in computational time, and in turn, led to clear predictions of protein function and interactions for several uncharacterized human proteins.
“…For example, in Fig. 10 A, C4orf19 (chromosome 4 open reading frame 19) is broadly expressed across human cell types and tissues [ 42 ], with high protein levels in the kidney, liver, and GI tract [ 43 , 44 ] and while little is known about its function, an observed relationship between C4orf19 and colorectal cancer suggests that high expression levels might have some value as a marker for colorectal cancer [ 45 ], although elevated C4orf19 expression is also reported to show a favorable association with renal cancer survival [ 43 , 44 ]. Notably, four of the other proteins in this cluster (PDCD10, STK24, STK25, STK26) are known to associate into a complex with roles in maintaining epithelial integrity [ 46 , 47 ] and kidney water balance by regulating aquaporin trafficking and abundance in kidney tubule epithelial cells [ 48 ], suggesting a potential role for C4orf19 in normal kidney function.…”
Background
Proteins often assemble into higher-order complexes to perform their biological functions. Such protein–protein interactions (PPI) are often experimentally measured for pairs of proteins and summarized in a weighted PPI network, to which community detection algorithms can be applied to define the various higher-order protein complexes. Current methods include unsupervised and supervised approaches, often assuming that protein complexes manifest only as dense subgraphs. Utilizing supervised approaches, the focus is not on how to find them in a network, but only on learning which subgraphs correspond to complexes, currently solved using heuristics. However, learning to walk trajectories on a network to identify protein complexes leads naturally to a reinforcement learning (RL) approach, a strategy not extensively explored for community detection. Here, we develop and evaluate a reinforcement learning pipeline for community detection on weighted protein–protein interaction networks to detect new protein complexes. The algorithm is trained to calculate the value of different subgraphs encountered while walking on the network to reconstruct known complexes. A distributed prediction algorithm then scales the RL pipeline to search for novel protein complexes on large PPI networks.
Results
The reinforcement learning pipeline is applied to a human PPI network consisting of 8k proteins and 60k PPI, which results in 1,157 protein complexes. The method demonstrated competitive accuracy with improved speed compared to previous algorithms. We highlight protein complexes such as C4orf19, C18orf21, and KIAA1522 which are currently minimally characterized. Additionally, the results suggest TMC04 be a putative additional subunit of the KICSTOR complex and confirm the involvement of C15orf41 in a higher-order complex with HIRA, CDAN1, ASF1A, and by 3D structural modeling.
Conclusions
Reinforcement learning offers several distinct advantages for community detection, including scalability and knowledge of the walk trajectories defining those communities. Applied to currently available human protein interaction networks, this method had comparable accuracy with other algorithms and notable savings in computational time, and in turn, led to clear predictions of protein function and interactions for several uncharacterized human proteins.
“…For example, in Figure 8A , C4orf19 (chromosome 4 open reading frame 19) is broadly expressed across human cell types and tissues [35], with high protein levels in the kidney, liver, and GI tract [36][37] and while little is known about its function, an observed relationship between C4orf19 and colorectal cancer suggests that high expression levels might have some value as a marker for colorectal cancer [38], although elevated C4orf19 expression is also reported to show a favorable association with renal cancer survival [36][37]. Notably, four of the other proteins in this cluster (PDCD10, STK24, STK25, STK26) are known to associate into a complex with roles in maintaining epithelial integrity [39][40] and kidney water balance by regulating aquaporin trafficking and abundance in kidney tubule epithelial cells [41], suggesting a potential role for C4orf19 in normal kidney function.…”
Many, if not most, proteins assemble into higher-order complexes to perform their biological functions. Such protein-protein interactions (PPI) are often experimentally measured for pairs of proteins and summarized in a weighted PPI network, to which community detection algorithms can be applied to define the various higher-order protein complexes. Current methods, which include both unsupervised and supervised approaches, often assume that protein complexes manifest only as dense subgraphs, and in the case of supervised approaches, focus only on learning which subgraphs correspond to complexes, not how to find them in a network, a task that is currently solved using heuristics. However, learning to walk trajectories on a network with the goal of finding protein complexes lends itself naturally to a reinforcement learning (RL) approach, a strategy that has not been extensively explored for community detection. Here, we evaluated the use of a reinforcement learning pipeline for community detection in weighted protein-protein interaction networks to detect new protein complexes. Using known complexes, the algorithm is trained to calculate the value of different possible subgraph densities in the process of walking on the network to find a protein complex. Then, a distributed prediction algorithm scales the RL pipeline to search for protein complexes on large PPI networks. The reinforcement learning pipeline applied to a human PPI network consisting of 8k proteins and 60k PPI results in 1,157 protein complexes and shows competitive accuracy with improved speed when compared to previous algorithms. We highlight protein complexes harboring minimally characterized proteins including C4orf19, C18orf21, and KIAA1522, suggest TMC04 to be a putative additional subunit of the KICSTOR complex, and confirm the participation of C15orf41 in a higher-order complex with CDAN1, ASF1A, and HIRA by 3D structural modeling. Reinforcement learning offers several distinct advantages for community detection, including scalability and knowledge of the walk trajectories defining those communities.
“…Wang W. et al. reported that regulated C4ORF19 could promote colon adenocarcinoma cell proliferation, invasion, and migration ( 26 ). However, our understanding of the role of C4ORF19 in clear cell renal carcinoma is not clear so far.…”
BackgroundRenal cell carcinoma (RCC) accounts for 90% of renal cancers, of which clear cell carcinoma (ccRCC) is the most usual histological type. The process of alternative splicing (AS) contributes to protein diversity, and the dysregulation of protein diversity may have a great influence on tumorigenesis. We developed a prognostic signature and comprehensively analyzed the role of tumor immune microenvironment (TIME) and immune checkpoint blocking (ICB) treatment in ccRCC.MethodsTo identify prognosis-related AS events, univariate Cox regression was used and functional annotation was performed using gene set enrichment analysis (GSEA). In this study, prognostic signatures were developed based on multivariate Cox, univariate Cox, and LASSO regression models. Moreover, to assess the prognostic value, the proportional hazards model, Kruskal–Wallis analysis, and ROC curves were used. To obtain a better understanding of TIME in ccRCC, the ESTIMATE R package, single sample gene set enrichment analysis (ssGSEA) algorithm, CIBERSORT method, and the tumor immune estimation resource (TIMER) were applied. The database was searched to verify the expression of C4OF19 in tumor and normal samples. Regulatory networks for AS-splicing factors (SFs) were visualized using Cytoscape 3.9.1.ResultsThere were 9,347 AS cases associated with the survival of ccRCC patients screened. A total of eight AS prognostic signatures were developed with stable prognostic predictive accuracy based on splicing subtypes. In addition, a qualitative prognostic nomogram was developed, and the prognostic prediction showed high effectiveness. In addition, we found that the combined signature was significantly associated with the diversity of TIME and ICB treatment-related genes. C4ORF19 might become an important prognostic factor for ccRCC. Finally, the AS-SF regulatory network was established to clearly reveal the potential function of SFs.ConclusionWe found novel and robust indicators (i.e., risk signature, prognostic nomogram, etc.) for the prognostic prediction of ccRCC. A new and reliable prognostic nomogram was established to quantitatively predict the clinical outcome. The AS-SF networks could provide a new way for the study of potential regulatory mechanisms, and the important roles of AS events in the context of TIME and immunotherapy efficiency were exhibited. C4ORF19 was found to be a vital gene in TIME and ICB treatment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.