Uncharacterized proteins have been underutilized as targets for the development of
novel therapeutics for difficult-to-treat bacterial infections. To
facilitate the exploration of these proteins, 2819 predicted, uncharacterized
proteins (19.1% of the total) from reference strains of multidrug Acinetobacter baumannii, Klebsiella
pneumoniae, and Pseudomonas aeruginosa species were organized using an unsupervised k-means machine learning algorithm. Classification using normalized values
for protein length, pI, hydrophobicity, degree of conservation, structural
disorder, and %AT of the coding gene rendered six natural clusters.
Cluster proteins showed different trends regarding operon membership,
expression, presence of unknown function domains, and interactomic
relevance. Clusters 2, 4, and 5 were enriched with highly disordered
proteins, nonworkable membrane proteins, and likely spurious proteins,
respectively. Clusters 1, 3, and 6 showed closer distances to known
antigens, antibiotic targets, and virulence factors. Up to 21.8% of
proteins in these clusters were structurally covered by modeling,
which allowed assessment of druggability and discontinuous B-cell
epitopes. Five proteins (4 in Cluster 1) were potential druggable
targets for antibiotherapy. Eighteen proteins (11 in Cluster 6) were
strong B-cell and T-cell immunogen candidates for vaccine development.
Conclusively, we provide a feature-based schema to fractionate the
functional dark proteome of critical pathogens for fundamental and
biomedical purposes.