The Human Proteome Project (HPP)
is leading the international effort
to characterize the human proteome. Although the main goal of this
project was first focused on the detection of missing proteins, a
new challenge arose from the need to assign biological functions to
the uncharacterized human proteins and describe their implications
in human diseases. Not only the proteins with experimental evidence
(uPE1 proteins) but also the uncharacterized missing proteins (uMPs)
were the objects of study in this challenge, neXt-CP50. In this work,
we developed a new bioinformatic approach to infer biological annotations
for the uPE1 proteins and uMPs based on a “guilt-by-association”
analysis using public RNA-Seq data sets. We used the correlation of
these proteins with the well-characterized PE1 proteins to construct
a network. In this way, we applied the PageRank algorithm to this
network to identify the most relevant nodes, which were the biological
annotations of the uncharacterized proteins. All of the generated
information was stored in a database. In addition, we implemented
the web application UPEFinder () to facilitate the access to this new resource. This information
is especially relevant for the researchers of the HPP who are interested
in the generation and validation of new hypotheses about the functions
of these proteins. Both the database and the web application are publicly
available ().