24Single cell genomics is a rapidly advancing field; however, most techniques are designed for 25 mammalian cells. Here, we present a single cell sequencing pipeline for the intracellular parasite, 26 Plasmodium falciparum, which harbors a relatively small genome with an extremely skewed 27 base content. Through optimization of a quasi-linear genome amplification method, we achieve 28 more even genome coverage and better targeting of the parasite genome over contaminants. 29 These improvements are particularly important for expanding the accessibility of single cell 30 approaches to new organisms and cell types and for improving the study of genetic mechanisms 31 of adaptation. 32 33 Keywords: whole-genome amplification, AT-skewed genome, malaria, single cell 34 sequencing, MALBAC 35 36 Background 37 Malaria is a life-threatening disease caused by protozoan Plasmodium parasites. P. falciparum 38 causes the greatest number of human malaria deaths [1]. The clinical symptoms of malaria occur 39 when parasites invade human erythrocytes and undergo rounds of asexual reproduction by 40 maturing from early forms into late stage parasites and bursting from erythrocytes to begin the 41 cycle again [2]. In this asexual cycle, parasites possess a single haploid genome during the early 42 stages; rapid genome replication in the later stages leads to an average of 16 genome copies [2].43 44Due to a lack of an effective vaccine, antimalarial drugs are required to treat malaria. However, 45 drug efficacy is threatened by the frequent emergence of resistant populations [3]. Copy number variations (CNVs), or the amplification and deletion of a genomic element, is one of the major 47 sources of genomic variation in P. falciparum that contribute to antimalarial resistance [4][5][6][7][8][9][10][11][12][13][14][15].
48Similar to bacteria and viruses [16][17][18], a high rate of CNVs may initiate genomic changes that 49 contribute to the rapid adaptation of this organism [7,19]. Despite the importance of CNVs, their 50 dynamics in evolving populations are not well understood.
52The majority of CNVs in P. falciparum have been identified by analyzing bulk DNA in which 53 the CNVs are present in a substantial fraction of individual parasites in the population due to 54 positive selection [8,10,11,20,21]. However, many CNVs likely remain undetected because 55 they are presumably either deleterious or offer no advantages for parasite growth or transmission 56 [22] and are therefore present in low frequency [20,22]. Currently, most CNVs are identified 57 using read-depth analysis of short read sequencing data, which derives an average signal across 58 the population. For this reason, genetic variants must be present in a high frequency (i.e. ~50%) 59 in the population to be detected [23][24][25]. Sequencing at very high depth improves the detection 60 of low frequency CNVs, but the sensitivity is limited to large-scale CNVs present in > 5% cells 61 [26][27][28]. Analysis methods that rely on the detection of reads that span CNV junctions h...