16 Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a 17 frequently used approach in microbial genomics. However, the choice of a reference may represent a 18 source of errors that may affect subsequent analyses such as the detection of single nucleotide 19 polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference 20 choice on short-read sequence data from five clinically and epidemiologically relevant bacteria 21 (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa 22 and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic 23 diversity of these species were selected as reference sequences, and read alignment statistics, SNP 24 calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the 25 mapping reference. The choice of different reference genomes proved to have an impact on almost all 26 the parameters considered in the five species. In addition, these biases had potential epidemiological 27 implications such as including/excluding isolates of particular clades and the estimation of genetic 28 distances. These findings suggest that the single reference approach might introduce systematic errors 29 during mapping that affect subsequent analyses, particularly for data sets with isolates from 30 genetically diverse backgrounds. In any case, exploring the effects of different references on the final 31 conclusions is highly recommended. 32 33 Author summary 34 Mapping consists in the alignment of reads (i.e., DNA fragments) obtained through high-throughput 35 genome sequencing to a previously assembled reference sequence. It is a common practice in genomic 36 studies to use a single reference for mapping, usually the 'reference genome' of a species -a high-37 quality assembly. However, the selection of an optimal reference is hindered by intrinsic intra-species 38 genetic variability, particularly in bacteria. Biases/errors due to reference choice for mapping in 39 bacteria have been identified. These are mainly originated in alignment errors due to genetic 40 differences between the reference genome and the read sequences. Eventually, they could lead to 41 misidentification of variants and biased reconstruction of phylogenetic trees (which reflect ancestry 42 between different bacterial lineages). However, a systematic work on the effects of reference choice 43 in different bacterial species is still missing, particularly regarding its impact on phylogenies. This 44 work intended to fill that gap. The impact of reference choice has proved to be pervasive in the five 45 bacterial species that we have studied and, in some cases, alterations in phylogenetic trees could lead 46 to incorrect epidemiological inferences. Hence, the use of different reference genomes may be 47 prescriptive to assess the potential biases of mapping. 48 49 Introduction 50 The development and increasing availability of high-throughput sequen...