30The colonization of land by plants was a pivotal event in the history of the biosphere, and yet the 31 underlying evolutionary features and innovations of the first land plant ancestors are not well 32 understood. Here we present the genome sequence of the unicellular alga Penium margaritaceum, 33 a member of the Zygnematophyceae, the sister lineage to land plants. The P. margaritaceum 34 genome has a high proportion of repeat sequences, which are associated with massive segmental 35 gene duplications, likely facilitating neofunctionalization. Compared with earlier diverging plant 36 lineages, P. margaritaceum has uniquely expanded repertoires of gene families, signaling 37 networks and adaptive responses, supporting its phylogenetic placement and highlighting the 38 evolutionary trajectory towards terrestrialization. These encompass a broad range of physiological 39 processes and cellular structures, such as large families of extracellular polymer biosynthetic and 40 modifying enzymes involved in cell wall assembly and remodeling. Transcriptome profiling of 41 cells exposed to conditions that are common in terrestrial habitats, namely high light and 42 desiccation, further elucidated key adaptations to the semi-aquatic ecosystems that are home to the 43 Zygnematophyceae. Such habitats, in which a simpler body plan would be advantageous, likely 44 provided the evolutionary crucible in which selective pressures shaped the transition to land. 45 Earlier diverging charophyte lineages that are characterized by more complex land plant-like 46 anatomies have either remained exclusively aquatic, or developed alternative life styles that allow 47 109 of 116.1 kb. The nuclear assembly captured most of the k-mers in the Illumina reads and low 110 frequency k-mers representing sequencing errors were absent ( Supplementary Fig. 1B). In addition, 111 the mapping rates of genomic and RNA-Seq reads against the nuclear assembly were 97.5% and 112 5 96.8%, respectively (Supplementary Table 2). The single nucleotide polymorphism (SNP) 113 frequency distribution on the 100 longest scaffolds was consistent with a haploid genome 114 ( Supplementary Fig. 1C). The mitochondrial and chloroplast genomes were also fully assembled, 115 and comprised 95,332 and 145,411 nucleotides, respectively ( Supplementary Fig. 2).
116The assembly contains a large proportion (80.6%) of repeat sequences (Supplementary 117 Table 3), particularly long terminal repeat (LTR) retrotransposons and simple repeats ( Fig. 2A).
118Unlike land plants and C. braunii, in which gypsy is the predominant LTR family, the P. 119 margaritaceum genome has a large proportion of copia retrotransposons, which are rare in other 120 green algae and absent from C. braunii (Nishiyama et al. 2018). An estimation of divergence time 121 indicated that the copia expansion in the P. margaritaceum genome was relatively recent, around 122 2.1 Mya (Fig. 2B). Retrotransposons carrying tyrosine recombinases, such as the DIRS and Ngaro 123 families, which are found in some chlorophyte...