Diarrhea is the second leading cause of death for children globally, causing 760,000 deaths each year in children under the age of 5. Amoebic dysentery contributes significantly to this burden, especially in developing countries. We hypothesize that genetic variation contributes to susceptibility to diarrhea-associated Entamoeba histolytica infection in Bangladeshi infants; thus, we conducted a genome-wide association study (GWAS) in two independent birth cohorts of diarrhea-associated E. histolytica infection. Cases were defined as children with at least one diarrheal episode positive for E. histolytica through either PCR or ELISA within the first year of life. Controls were children without any episodes positive for E. histolytica in the same time frame. Meta-analyses under a fixed-effects inverse variance weighting model identified variants in two neighboring genes on chromosome 10: CUL2 (cullin 2) and CREM (cAMP responsive element modulator) associated with E. histolytica infection, with SNP rs58000832 achieving genome-wide significance (Pmeta=4.2x10 -10 ). Each additional risk allele (an intergenic insertion between CREM and CCNY) of rs58000832 conferred 2.5 increased odds of a diarrhea-associated E. histolytica infection. The most associated SNP within a gene was in an intron of CREM (rs58468685, Pmeta=2.3x10 -9 ), which with CUL2, has been implicated as a susceptibility locus for Inflammatory Bowel Disease (IBD) and Crohn's Disease. Gene expression resources suggest these loci are related to the higher expression of CREM, but not CUL2. Increased CREM expression is also observed in early E. histolytica infection. Further, CREM -/mice were more susceptible to E. histolytica amebic colitis. These genetic associations reinforce the pathological similarities observed in gut inflammation between E. histolytica infection and IBD.
PROVIDE genetic data:Within PROVIDE, 541 children were genotyped on Illumina's Infinium Multiethnic Global Array (MEGA). Standard quality control metrics were used for the genome-wide data. Single nucleotide polymorphism (SNP) filters included genotype missingness <5% (none), minor allele frequency (MAF) >0.5% (M=659,171), and Hardy-Weinberg equilibrium P-value >10E-5 (M=789). Individuals were filtered for individual missingness <2% (none), heterozygosity outliers (N=4), principal components outliers (none). One individual from each first and second degree relative pairs were removed (N=36). After both individual and SNP-level filters, there were 699,246 SNPs and 499 individuals. The genetic data was split into chromosomes for phasing and imputation. Each chromosome was phased using SHAPEIT 28,29 v2.r790 with 1000 Genomes Project Phase 3 data as the reference. 16 After phasing, the chromosomes were imputed using IMPUTE v2.3.2 30-34 using 1000 Genomes Project Phase 3 data as reference.
E. histolytica detection protocol:The detection protocol for E. histolytica has previously been described in Haque et al (2007). 35 Primers and Taqman probes for E. histolytica (accession no. X64142) were d...