1The transmission and pathogenesis of human immunodeficiency virus type 1 (HIV-1) is dispropor-2 tionately influenced by evolution in the five variable regions of the virus surface envelope glyco-3 protein (gp120). Insertions and deletions (indels) are a significant source of evolutionary change 4 in these regions. However, the influx of indels relative to nucleotide substitutions has not yet been 5 quantified through a comparative analysis of HIV-1 sequence data. Here we develop and report 6 results from a phylogenetic method to estimate indel rates for the gp120 variable regions across 7 five major subtypes and two circulating recombinant forms (CRFs) of HIV-1 group M. We pro-8 cessed over 26,000 published HIV-1 gp120 sequences, from which we extracted 6,605 sequences 9 for phylogenetic analysis. In brief, our method employs maximum likelihood to reconstruct phy-10 logenies scaled in time and fits a Poisson model to the observed distribution of indels between 11 closely related pairs of sequences in the tree (cherries). The rate estimates ranged from 3.0 × 10 −5 12 to 1.5 × 10 −3 indels/nt/year and varied significantly among variable regions and subtypes. Indel 13 rates were significantly lower in the region encoding variable loop V3, and also lower for HIV-1 14 subtype B relative to other subtypes. We also found that variable loops V1, V2 and V4 tended 15 to accumulate significantly longer indels. Further, we observed that the nucleotide composition 16 of indel sequences was significantly distinct from that of the flanking sequence in HIV-1 gp120.
17Indels affected potential N-linked glycosylation sites substantially more often in V1 and V2 than 18 expected by chance, which is consistent with positive selection on glycosylation patterns within 19 these regions of gp120. These results represent the first comprehensive measures of indel rates in 20 HIV-1 gp120 across multiple subtypes and CRFs, and identifies novel and unexpected patterns for 21 further research in the molecular evolution of HIV-1. 22