As increasingly large amounts of data from genome and other sequencing projects become available, new approaches are needed to determine the functions of the proteins these genes encode. We show how large-scale computational analysis can help to address this challenge by linking functional information to sequence and structural similarities using protein similarity networks. Network analyses using three functionally diverse enzyme superfamilies illustrate the use of these approaches for facile updating and comparison of available structures for a large superfamily, for creation of functional hypotheses for metagenomic sequences, and to summarize the limits of our functional knowledge about even well studied superfamilies.In the post-genomic era, access to large amounts of gene sequence and protein structure data has become the norm; by mid-2011, the number of protein sequences in the UniProt/TrEMBL Database (1) topped 16 million, whereas the Protein Data Bank (2) contained over 73,000 structures. Additional millions of sequences are becoming available from newer types of genome projects, including metagenomics projects, with one report for the human gut microbiome accounting for an additional 3.3 million microbial genes (3). Because experimental determination of protein function lags far behind the rate of sequence and structure determination, improved computational methods for function prediction are urgently needed to help bridge the gap between sequenced genes and functionally characterized protein products. In response, new methods are rapidly being developed to address these challenges, and community efforts are now under way to increase the pace of experimental and computational prediction of protein function (4, 5). Another large-scale effort (http://www.nigms.nih.gov/ News/Results/gluegrant_051510.htm) aims to develop a combined experimental/computational strategy for the prediction of the reaction and substrate specificity of enzymes, the protein class that is the subject of this minireview. Additionally, community challenges such as the Critical Assessment of Function Annotations (CAFA) (Automated Function Prediction 2011) have been mounted to assess and improve the current state of automated prediction of protein function. Viewing the glass as half-full, progress in sequencing and annotation over the last decade led one group to estimate that some functional features can be assigned to as much as 85% of proteins in completely sequenced genomes (6). From a more skeptical perspective, more recent assessments of annotation accuracy suggest that computational approaches are especially prone to misannotation (7, 8), indicating that significant challenges for functional inference remain.This minireview focuses on how new insights about protein structure-function relationships and functional inference can be obtained from large-scale analyses of proteins, specifically for "functionally diverse" enzyme superfamilies. We define these types of superfamilies as sets of homologous proteins that conserve structura...