José M. Peregrín-Alvarez scite author profile

Identification of protein-protein interactions often provides insight into protein function, and many cellular processes are performed by stable protein complexes. We used tandem affinity purification to process 4,562 different tagged proteins of the yeast Saccharomyces cerevisiae. Each preparation was analysed by both matrix-assisted laser desorption/ionization-time of flight mass spectrometry and liquid chromatography tandem mass spectrometry to increase coverage and accuracy. Machine learning was used to integrate the mass spectrometry scores and assign probabilities to the protein-protein interactions. Among 4,087 different proteins identified with high confidence by mass spectrometry from 2,357 successful purifications, our core data set (median precision of 0.69) comprises 7,123 protein-protein interactions involving 2,708 proteins. A Markov clustering algorithm organized these interactions into 547 protein complexes averaging 4.9 subunits per complex, about half of them absent from the MIPS database, as well as 429 additional interactions between pairs of complexes. The data (all of which are available online) will help future studies on individual proteins as well as functional genomics and systems biology.

show abstract

Interaction network containing conserved and essential protein complexes in Escherichia coli

Butland

Peregrín-Alvarez

et al. 2005

Nature

1,076

1,043

View full text Add to dashboard Cite

Proteins often function as components of multi-subunit complexes. Despite its long history as a model organism, no large-scale analysis of protein complexes in Escherichia coli has yet been reported. To this end, we have targeted DNA cassettes into the E. coli chromosome to create carboxy-terminal, affinity-tagged alleles of 1,000 open reading frames (approximately 23% of the genome). A total of 857 proteins, including 198 of the most highly conserved, soluble non-ribosomal proteins essential in at least one bacterial species, were tagged successfully, whereas 648 could be purified to homogeneity and their interacting protein partners identified by mass spectrometry. An interaction network of protein complexes involved in diverse biological processes was uncovered and validated by sequential rounds of tagging and purification. This network includes many new interactions as well as interactions predicted based solely on genomic inference or limited phenotypic data. This study provides insight into the function of previously uncharacterized bacterial proteins and the overall topology of a microbial interaction network, the core components of which are broadly conserved across Prokaryota.

show abstract

Draft Genome of the Filarial Nematode ParasiteBrugia malayi

Ghedin

Wang

Spiro

et al. 2007

Science

568

545

View full text Add to dashboard Cite

Parasitic nematodes that cause elephantiasis and river blindness threaten hundreds of millions of people in the developing world. We have sequenced the approximately 90 megabase (Mb) genome of the human filarial parasite Brugia malayi and predict approximately 11,500 protein coding genes in 71 Mb of robustly assembled sequence. Comparative analysis with the free-living, model nematode Caenorhabditis elegans revealed that, despite these genes having maintained little conservation of local synteny during approximately 350 million years of evolution, they largely remain in linkage on chromosomal units. More than 100 conserved operons were identified. Analysis of the predicted proteome provides evidence for adaptations of B. malayi to niches in its human and vector hosts and insights into the molecular basis of a mutualistic relationship with its Wolbachia endosymbiont. These findings offer a foundation for rational drug design.

show abstract

The conservation and evolutionary modularity of metabolism

2009

View full text Add to dashboard Cite

Metabolic modularity

A novel evolutionary analysis of metabolic networks across 26 taxa reveals a highly-conserved but flexible core of metabolic enzymes. Abstract Background: Cellular metabolism is a fundamental biological system consisting of myriads of enzymatic reactions that together fulfill the basic requirements of life. The recent availability of vast amounts of sequence data from diverse sets of organisms provides an opportunity to systematically examine metabolism from a comparative perspective. Here we supplement existing genome and protein resources with partial genome datasets derived from 193 eukaryotes to present a comprehensive survey of the conservation of metabolism across 26 taxa representing the three domains of life.

show abstract

The Phylogenetic Extent of Metabolic Enzymes and Pathways

Peregrín-Alvarez¹,

Tsoka²,

Ouzounis³

2003

Genome Res.

View full text Add to dashboard Cite

The evolution of metabolic enzymes and pathways has been a subject of intense study for more than half a century. Yet, so far, previous studies have focused on a small number of enzyme families or biochemical pathways. Here, we examine the phylogenetic distribution of the full-known metabolic complement of Escherichia coli, using sequence comparison against taxa-specific databases. Half of the metabolic enzymes have homologs in all domains of life, representing families involved in some of the most fundamental cellular processes. We thus show for the first time and in a comprehensive way that metabolism is conserved at the enzyme level. In addition, our analysis suggests that despite the sequence conservation and the extensive phylogenetic distribution of metabolic enzymes, their groupings into biochemical pathways are much more variable than previously thought.

show abstract

The Modular Organization of Protein Interactions in Escherichia coli

Peregrín-Alvarez

Xiong

et al. 2009

PLoS Comput Biol

View full text Add to dashboard Cite

Escherichia coli serves as an excellent model for the study of fundamental cellular processes such as metabolism, signalling and gene expression. Understanding the function and organization of proteins within these processes is an important step towards a ‘systems’ view of E. coli. Integrating experimental and computational interaction data, we present a reliable network of 3,989 functional interactions between 1,941 E. coli proteins (∼45% of its proteome). These were combined with a recently generated set of 3,888 high-quality physical interactions between 918 proteins and clustered to reveal 316 discrete modules. In addition to known protein complexes (e.g., RNA and DNA polymerases), we identified modules that represent biochemical pathways (e.g., nitrate regulation and cell wall biosynthesis) as well as batteries of functionally and evolutionarily related processes. To aid the interpretation of modular relationships, several case examples are presented, including both well characterized and novel biochemical systems. Together these data provide a global view of the modular organization of the E. coli proteome and yield unique insights into structural and evolutionary relationships in bacterial networks.

show abstract

The origins of apicomplexan sequence innovation

Wasmuth¹,

Daub²,

Peregrín-Alvarez³

et al. 2009

Genome Res.

View full text Add to dashboard Cite

The Apicomplexa are a group of phylogenetically related parasitic protists that include Plasmodium, Cryptosporidium, and Toxoplasma. Together they are a major global burden on human health and economics. To meet this challenge, several international consortia have generated vast amounts of sequence data for many of these parasites. Here, we exploit these data to perform a systematic analysis of protein family and domain incidence across the phylum. A total of 87,736 protein sequences were collected from 15 apicomplexan species. These were compared with three protein databases, including the partial genome database, PartiGeneDB, which increases the breadth of taxonomic coverage. From these searches we constructed taxonomic profiles that reveal the extent of apicomplexan sequence diversity. Sequences without a significant match outside the phylum were denoted as apicomplexan specialized. These were collated into 9134 discrete protein families and placed in the context of the apicomplexan phylogeny, identifying the putative origin of each family. Most apicomplexan families were associated with an individual genus or species. Interestingly, many genera-specific innovations were associated with specialized host cell invasion and/or parasite survival processes. Contrastingly, those families reflecting more ancestral relationships were enriched in generalized housekeeping functions such as translation and transcription, which have diverged within the apicomplexan lineage. Protein domain searches revealed 192 domains not previously reported in apicomplexans together with a number of novel domain combinations. We highlight domains that may be important to parasite survival.

show abstract

An Exponential Core in the Heart of the Yeast Protein Interaction Network

Pereira-Leal

Audit

Peregrín-Alvarez

et al. 2004

View full text Add to dashboard Cite

Protein interactions in the budding yeast have been shown to form a scale-free network, a feature of other organized networks such as bacterial and archaeal metabolism and the World Wide Web. Here, we study the connections established by yeast proteins and discover a preferential attachment between essential proteins. The essential-essential connections are long ranged and form a subnetwork where the giant component includes 97% of these proteins. Unexpectedly, this subnetwork displays an exponential connectivity distribution, in sharp contrast to the scale-free topology of the complete network. Furthermore, the wide phylogenetic extent of these core proteins and interactions provides evidence that they represent the ancestral state of the yeast protein interaction network. Finally, we propose that this core exponential network may represent a generic scaffold around which organism-specific and taxon-specific proteins and interactions coalesce.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.