“…The Protein-protein Interaction Prediction Engine (PIPE), developed by researchers in the Carleton University Bioinformatics Research Group, is a method for predicting novel protein-protein interactions (PPI). Originally published in [9] as a method for predicting PPI in the yeast species S. cerevisiae (more commonly known as Baker's yeast), the computational and classification performance of PIPE have since been improved in [63], [64]. In [65], the method was shown to be applicable to a variety of species including C. elegans, E. coli, H. sapiens, S. cerevisiae, and S. pombe.…”
Section: The Protein-protein Interaction Prediction Engine (Pipe)mentioning
confidence: 99%
“…The score is then the average value of the modified landscape. The second method of summarizing the landscape into a single score is known as the similarity-weighted (SW) method [64]. This method was developed because certain sliding windows are seen in a very large number of proteins, but are not responsible for supporting or mediating interactions.…”
Section: Figure 10: Overview Of Pipe Algorithm For One Pair Of Slidinmentioning
confidence: 99%
“…After the score for each pair of windows is divided by the normalization factor, the overall SW score is simply the average value of the landscape. The SW method was developed in [64] and compared to the original method; overall the SW method was shown to be superior the original method at all operating points, and is the main score used by PIPE. Figure 13: SW method for modifying landscape (reproduced from [64]).…”
Section: Figure 10: Overview Of Pipe Algorithm For One Pair Of Slidinmentioning
confidence: 99%
“…The SW method was developed in [64] and compared to the original method; overall the SW method was shown to be superior the original method at all operating points, and is the main score used by PIPE. Figure 13: SW method for modifying landscape (reproduced from [64]). The similarity list of Ai given in figure is referred to as simprots(WAi) in this work.…”
Section: Figure 10: Overview Of Pipe Algorithm For One Pair Of Slidinmentioning
This thesis explores issues arising when one attempts to predict protein-protein interactions (PPI) involving multiple species using the Protein-protein Interaction Prediction Engine (PIPE) method. In cross-species predictions, where one predicts PPI in a target species given known PPI in a different training species, we showed that prediction performance is inversely correlated to the evolutionary distance between training and target species. With a change in the score calculation, we improved the area under the precision-recall curve by 45% when using seven well-studied species to predict an eighth.In inter-species predictions, one attempts to predict interactions between proteins arising from two different species, such as a host and a pathogen. For the first time, we have shown that PIPE is able to predict such inter-species PPI by predicting 229 novel PPI between HIV and human at an estimated precision of 82% (100:1 class imbalance).Lastly, by modifying a main data structure of PIPE, we also improved the speed of the PIPE algorithm by a factor of 53x when predicting H. sapiens PPI. Using the methods developed in this thesis, we have predicted all possible PPI between soybean and the Soybean Cyst Nematode pathogen. Collaborators at Agriculture and Agri-Food Canada will be pursuing and validating these predictions as they seek to combat this costly pest.iii Acknowledgements I would like to thank my supervisor, James Green for his support, patience and guidance throughout this experience. I am grateful for the opportunity he provided me with to pursue this research and for all the knowledge he shared with me throughout this time.
“…The Protein-protein Interaction Prediction Engine (PIPE), developed by researchers in the Carleton University Bioinformatics Research Group, is a method for predicting novel protein-protein interactions (PPI). Originally published in [9] as a method for predicting PPI in the yeast species S. cerevisiae (more commonly known as Baker's yeast), the computational and classification performance of PIPE have since been improved in [63], [64]. In [65], the method was shown to be applicable to a variety of species including C. elegans, E. coli, H. sapiens, S. cerevisiae, and S. pombe.…”
Section: The Protein-protein Interaction Prediction Engine (Pipe)mentioning
confidence: 99%
“…The score is then the average value of the modified landscape. The second method of summarizing the landscape into a single score is known as the similarity-weighted (SW) method [64]. This method was developed because certain sliding windows are seen in a very large number of proteins, but are not responsible for supporting or mediating interactions.…”
Section: Figure 10: Overview Of Pipe Algorithm For One Pair Of Slidinmentioning
confidence: 99%
“…After the score for each pair of windows is divided by the normalization factor, the overall SW score is simply the average value of the landscape. The SW method was developed in [64] and compared to the original method; overall the SW method was shown to be superior the original method at all operating points, and is the main score used by PIPE. Figure 13: SW method for modifying landscape (reproduced from [64]).…”
Section: Figure 10: Overview Of Pipe Algorithm For One Pair Of Slidinmentioning
confidence: 99%
“…The SW method was developed in [64] and compared to the original method; overall the SW method was shown to be superior the original method at all operating points, and is the main score used by PIPE. Figure 13: SW method for modifying landscape (reproduced from [64]). The similarity list of Ai given in figure is referred to as simprots(WAi) in this work.…”
Section: Figure 10: Overview Of Pipe Algorithm For One Pair Of Slidinmentioning
This thesis explores issues arising when one attempts to predict protein-protein interactions (PPI) involving multiple species using the Protein-protein Interaction Prediction Engine (PIPE) method. In cross-species predictions, where one predicts PPI in a target species given known PPI in a different training species, we showed that prediction performance is inversely correlated to the evolutionary distance between training and target species. With a change in the score calculation, we improved the area under the precision-recall curve by 45% when using seven well-studied species to predict an eighth.In inter-species predictions, one attempts to predict interactions between proteins arising from two different species, such as a host and a pathogen. For the first time, we have shown that PIPE is able to predict such inter-species PPI by predicting 229 novel PPI between HIV and human at an estimated precision of 82% (100:1 class imbalance).Lastly, by modifying a main data structure of PIPE, we also improved the speed of the PIPE algorithm by a factor of 53x when predicting H. sapiens PPI. Using the methods developed in this thesis, we have predicted all possible PPI between soybean and the Soybean Cyst Nematode pathogen. Collaborators at Agriculture and Agri-Food Canada will be pursuing and validating these predictions as they seek to combat this costly pest.iii Acknowledgements I would like to thank my supervisor, James Green for his support, patience and guidance throughout this experience. I am grateful for the opportunity he provided me with to pursue this research and for all the knowledge he shared with me throughout this time.
“…PIPE has two scoring methods to measure the accuracy of the predictions, namely the PIPE score and the sim-weighted score. The sim-weighted score was used in this study because it produces less false positives compared to the traditional PIPE score [39].…”
Section: Pipe Setup and Predictions Of Ppismentioning
Alternative Splicing (AS) is a process that is believed to have links to cellular function changes and some diseases in humans. Although AS was first discovered in the 1970s, not much research has been conducted on its role in functional implications on the proteome level. This study aims to use PIPE, a protein-protein interaction prediction algorithm, along with a tissue expression dataset to build a pipeline that differentiates between AS isoform products by analyzing isoform sequence changes, functional changes, and tissue expression changes that AS introduces. The study found that isoform sequence changes in alternative isoforms tend to be conserved deletions of amino-acid sub-sequences. The study also found that there is a statistically significant overlap between PIPE-predicted protein-protein interaction (PPI) network changes and tissue expression changes of alternatively spliced isoforms (ASIs) relative to their canonical isoforms (CIs) with a p-value of 8.25×10 −5 . Finally, among the analysis pipeline top ten genes with predicted significant ASIs' PPI network changes, LMO2, THOC2, and UBE2L3 are genes that were suspected of having links to different diseases such as basel-type breast cancer, intellectual disability (ID) and numerous autoimmune diseases according to literature studies. i Firstly, I would like to thank my supervisor Dr. Frank Dehne for his outstanding support throughout my Master's degree. Dr. Dehne provided me with invaluable advice, support, and kindness during my Master's journey. I could not have asked for a better supervisor.
The need for larger-scale and increasingly complex protein-protein interaction (PPI) prediction tasks demands that state-of-the-art predictors be highly efficient and adapted to inter-and cross-species predictions. Furthermore, the ability to generate comprehensive interactomes has enabled the appraisal of each PPI in the context of all predictions leading to further improvements in classification performance in the face of extreme class imbalance using the Reciprocal Perspective (RP) framework. We here describe the PIPE4 algorithm. Adaptation of the PIPE3/MP-PIPE sequence preprocessing step led to upwards of 50x speedup and the new Similarity Weighted Score appropriately normalizes for window frequency when applied to any inter-and cross-species prediction schemas. Comprehensive interactomes for three prediction schemas are generated: (1) cross-species predictions, where Arabidopsis thaliana is used as a proxy to predict the comprehensive Glycine max interactome, (2) interspecies predictions between Homo sapiens-HIV1, and (3) a combined schema involving both cross-and inter-species predictions, where both Arabidopsis thaliana and Caenorhabditis elegans are used as proxy species to predict the interactome between Glycine max (the soybean legume) and Heterodera glycines (the soybean cyst nematode). Comparing PIPE4 with the state-of-the-art resulted in improved performance, indicative that it should be the method of choice for complex PPI prediction schemas. The elucidation of protein-protein interaction (PPI) networks is central to molecular biology research. Necessary to producing mechanistic models of cellular processes, PPI networks additionally contribute to challenges such as the prediction of gene function 1-3 , identification of disease genes 4 , and pharmaceutical discovery 5,6. Computational PPI prediction techniques have been developed to supplement and guide wet-laboratory experimental work. The last decade has seen increased computational demand in both scale and complexity of PPI predictors. Predicting comprehensive interactomes (the set of all possible pairwise PPIs in or between proteomes) has only recently become possible with the advent of high-performance computing infrastructure and algorithmic optimizations. While methodologically diverse in their implementation, PPI prediction tools generally exploit information from the set of known PPIs (previously confirmed using classical wet-laboratory techniques) to determine whether any two query proteins will physically interact. The utility and scalability of any one method is subject to the information it leverages. Structure-based methods, at one extreme, require the three-dimensional (3D) characterization of each protein and therefore suffer from low coverage of the proteome. While useful to determining highly specific PPI networks, many methods require template-based modelling which tend to be computationally taxing 7-9. Furthermore, even with complete 3D structural information of each protein in an organism's proteome, the computational time comp...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.