16Large-scale human genetics studies are ascertaining increasing proportions of 17 populations as they continue growing in both number and scale. As a result, the amount 18 of cryptic relatedness within these study cohorts is growing rapidly and has significant 19 implications on downstream analyses. We demonstrate this growth empirically among 20 the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation 21 framework we developed called SimProgeny, show that these measures are in-line with 22 expectations given the underlying population and ascertainment approach. For example, 23 we identified ~66,000 close (first-and second-degree) relationships within DiscovEHR 1 involving 55.6% of study participants. Our simulation results project that >70% of the 2 cohort will be involved in these close relationships as DiscovEHR scales to 250,000 3 recruited individuals. We reconstructed 12,574 pedigrees using these relationships 4 (including 2,192 nuclear families) and leveraged them for multiple applications. The 5 pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious 6 compound heterozygous mutations. Reconstructed nuclear families were critical for 7 identifying 3,415 de novo mutations in ~1,783 genes. Finally, we demonstrate the 8 segregation of known and suspected disease-causing mutations through reconstructed 9 pedigrees, including a tandem duplication in LDLR causing familial hypercholesterolemia. 10In summary, this work highlights the prevalence of cryptic relatedness expected among 11 large healthcare population genomic studies and demonstrates several analyses that are 12 uniquely enabled by large amounts of cryptic relatedness. 13 14 Key words: cryptic relatedness; pedigree reconstruction; relationship inference; identity 15 by decent; compound heterozygous mutation phasing; de novo mutations; precision 16 medicine; healthcare population-based genetic study; exome sequencing; family structure; 17 familial-hypercholesterolemia 18 19 20 21 22