Background: Autism spectrum disorder (ASD) has phenotypically and genetically heterogeneous characteristics. A simulation study demonstrated that attempts to categorize patients with a complex disease into more homogeneous subgroups could have more power to elucidate hidden heritability.
Methods:We conducted cluster analyses using the k-means algorithm with a cluster number of 15 based on phenotypic variables from the Simons Simplex Collection (SSC). As a preliminary study, we conducted a conventional genome-wide association study (GWAS) with a dataset of 597 ASD cases and 370 controls. In the second step, we divided cases based on the clustering results and conducted GWAS in each of the subgroups vs controls (cluster-based GWAS). We also conducted cluster-based GWAS on another SSC dataset of 712 probands and 354 controls in the replication stage.
Results:In the preliminary study, we observed no significant associations. In the second step of cluster-based GWASs, we identified 65 chromosomal loci, which included 30 intragenic loci located in 21 genes and 35 intergenic loci that satisfied the threshold of P<5.0×10 −8 . Some of these loci were located within or near previously reported candidate genes for ASD: CDH5, CNTN5, CNTNAP5, DNAH17, DPP10, DSCAM, FOXK1, GABBR2, GRIN2A5, ITPR1, NTM, SDK1, SNCA and SRRM4. Of these 65 significant chromosomal loci, rs11064685 located within the SRRM4 gene had a significantly different distribution in the cases vs. controls in the replication cohort.
Conclusions:These findings suggest that clustering may successfully identify subgroups with relatively homogeneous disease etiologies. Further cluster validation and replication studies are warranted in larger cohorts.