BackgroundMissense pharmacogenomic (PGx) variants refer to amino acid substitutions that
potentially affect the pharmacokinetic (PK) or pharmacodynamic (PD) response to
drug therapies. The PGx variants, as compared to disease-associated variants, have
not been investigated as deeply. The ability to computationally predict future PGx
variants is desirable; however, it is not clear what data sets should be used or
what features are beneficial to this end. Hence we carried out a comparative
characterization of PGx variants with annotated neutral and disease variants from
UniProt, to test the predictive power of sequence conservation and structural
information in discriminating these three groups.Results126 PGx variants of high quality from PharmGKB were selected and two data sets
were created: one set contained 416 variants with structural and sequence
information, and, the other set contained 1,265 variants with sequence information
only. In terms of sequence conservation, PGx variants are more conserved than
neutral variants and much less conserved than disease variants. A weighted random
forest was used to strike a more balanced classification for PGx variants.
Generally structural features are helpful in discriminating PGx variant from the
other two groups, but still classification of PGx from neutral polymorphisms is
much less effective than between disease and neutral variants.ConclusionsWe found that PGx variants are much more similar to neutral variants than to
disease variants in the feature space consisting of residue conservation,
neighboring residue conservation, number of neighbors, and protein solvent
accessibility. Such similarity poses great difficulty in the classification of PGx
variants and polymorphisms.