“…One useful pangenomics tool for measuring non-reference variation that is readily applicable to common 2/29 short-read datasets is the k-mer. K-mers are subsequences of length k derived from a larger sequence and they have a long history of use in computer science [Shannon, 1948], genome assembly [Turner et al, 2018], metagenomics [Benoit et al, 2016], and quantitative genetics [Rahman et al, 2018, Voichek and Weigel, 2020, Kim et al, 2020, Mehrab et al, 2021. Recent studies have also demonstrated the utility of k-mers for measuring heterozygosity and genetic differences between individuals (commonly referred to as "dissimilarity" measures, Ondov et al [2016], Vurture et al [2017], Ranallo-Benavidez et al [2020], VanWallendael andAlvarez [2022]).…”