The analysis of the first plant genomes provided unexpected evidence for genome duplication events in species that had previously been considered as true diploids on the basis of their genetics [1][2][3] . These polyploidization events may have had important consequences in plant evolution, in particular for species radiation and adaptation and for the modulation of functional capacities 4-10 . Here we report a high-quality draft of the genome sequence of grapevine (Vitis vinifera) obtained from a highly homozygous genotype. The draft sequence of the grapevine genome is the fourth one produced so far for flowering plants, the second for a woody species and the first for a fruit crop (cultivated for both fruit and beverage). Grapevine was selected because of its important place in the cultural heritage of humanity beginning during the Neolithic period 11 . Several large expansions of gene families with roles in aromatic features are observed. The grapevine genome has not undergone recent genome duplication, thus enabling the discovery of ancestral traits and features of the genetic organization of flowering plants. This analysis reveals the contribution of three ancestral genomes to the grapevine haploid content. This ancestral arrangement is common to many dicotyledonous plants but is absent from the genome of rice, which is a monocotyledon. Furthermore, we explain the chronology of previously described whole-genome duplication events in the evolution of flowering plants.All grapevine varieties are highly heterozygous; preliminary data showed that there was as much as 13% sequence divergence between alleles, which would hinder reliable contig assembly when a wholegenome shotgun strategy was used for sequencing. Our consortium therefore selected the grapevine PN40024 genotype for sequencing. This line, originally derived from Pinot Noir, has been bred close to full homozygosity (estimated at about 93%) by successive selfings, permitting a high-quality whole-genome shotgun assembly.A total of 6.2 million end-reads were produced by our consortium, representing an 8.4-fold coverage of the genome. Within the assembly, performed with Arachne 12 , 316 supercontigs represent putative allelic haplotypes that constitute 11.6 million bases (Mb). These values are in good fit with the 7% residual heterozygosity of PN40024 assessed by using genetic markers. When considering only one of the haplotypes in each heterozygous region, the assembly (Table 1a) consists of 19,577 contigs (N 50 5 65.9 kilobases (kb), where N 50 corresponds to the size of the shorter supercontig or contig in a subset representing half of the assembly size) and 3,514 supercontigs (N 50 5 2.07 Mb) totalling 487 Mb. This value is close to the 475 Mb previously reported for the grapevine genome size 13 .Using a set of 409 molecular markers from the reference grapevine map 14 , 69% of the assembled 487 Mb, arranged into 45 ultracontigs
OPENRosaceae is the most important fruit-producing clade, and its key commercially relevant genera (Fragaria, Rosa, Rubus and Prunus) show broadly diverse growth habits, fruit types and compact diploid genomes. Peach, a diploid Prunus species, is one of the best genetically characterized deciduous trees. Here we describe the high-quality genome sequence of peach obtained from a completely homozygous genotype. We obtained a complete chromosome-scale assembly using Sanger whole-genome shotgun methods. We predicted 27,852 protein-coding genes, as well as noncoding RNAs. We investigated the path of peach domestication through whole-genome resequencing of 14 Prunus accessions. The analyses suggest major genetic bottlenecks that have substantially shaped peach genome diversity. Furthermore, comparative analyses showed that peach has not undergone recent whole-genome duplication, and even though the ancestral triplicated blocks in peach are fragmentary compared to those in grape, all seven paleosets of paralogs from the putative paleoancestor are detectable.
BackgroundNext Generation Sequencing technologies are able to provide high genome coverages at a relatively low cost. However, due to limited reads' length (from 30 bp up to 200 bp), specific bioinformatics problems have become even more difficult to solve. De novo assembly with short reads, for example, is more complicated at least for two reasons: first, the overall amount of "noisy" data to cope with increased and, second, as the reads' length decreases the number of unsolvable repeats grows. Our work's aim is to go at the root of the problem by providing a pre-processing tool capable to produce (in-silico) longer and highly accurate sequences from a collection of Next Generation Sequencing reads.ResultsIn this paper a seed-and-extend local assembler is presented. The kernel algorithm is a loop that, starting from a read used as seed, keeps extending it using heuristics whose main goal is to produce a collection of error-free and longer sequences. In particular, GapFiller carefully detects reliable overlaps and operates clustering similar reads in order to reconstruct the missing part between the two ends of the same insert. Our tool's output has been validated on 24 experiments using both simulated and real paired reads datasets. The output sequences are declared correct when the seed-mate is found. In the experiments performed, GapFiller was able to extend high percentages of the processed seeds and find their mates, with a false positives rate that turned out to be nearly negligible.ConclusionsGapFiller, starting from a sufficiently high short reads coverage, is able to produce high coverages of accurate longer sequences (from 300 bp up to 3500 bp). The procedure to perform safe extensions, together with the mate-found check, turned out to be a powerful criterion to guarantee contigs' correctness. GapFiller has further potential, as it could be applied in a number of different scenarios, including the post-processing validation of insertions/deletions detection pipelines, pre-processing routines on datasets for de novo assembly pipelines, or in any hierarchical approach designed to assemble, analyse or validate pools of sequences.
A collection of 1005 grapevine accessions was genotyped at 34 microsatellite loci (SSR) with the aim of analysing genetic diversity and exploring parentages. The comparison of molecular profiles revealed 200 groups of synonymy. The removal of perfect synonyms reduced the database to 745 unique genotypes, on which population genetic parameters were calculated. The analysis of kinship uncovered 74 complete pedigrees, with both parents identified. Many of these parentages were not previously known and are of considerable historical interest, e.g. Chenin blanc (Sauvignon × Traminer rot), Covè (Harslevelu selfed), Incrocio Manzoni 2-14 and 2-15 (Cabernet franc × Prosecco), Lagrein (Schiava gentile × Teroldego), Malvasia nera of Bolzano (Perera × Schiava gentile), Manzoni moscato (Raboso veronese × Moscato d'Amburgo), Moscato violetto (Moscato bianco × Duraguzza), Muscat of Alexandria (Muscat blanc à petit grain × Axina de tres bias) and others. Statistical robustness of unexpected pedigrees was reinforced with the analysis of an additional 7-30 SSRs. Grouping the accessions by profile resulted in a weak correlation with their geographical origin and/or current area of cultivation, revealing a large admixture of local varieties with those most widely cultivated, as a result of ancient commerce and population flow. The SSRs with tri- to penta-nucleotide repeats adopted for the present study showed a great capacity for discriminating amongst accessions, with probabilities of identity by chance as low as 1.45 × 10(-27) and 9.35 × 10(-12) for unrelated and full sib individuals, respectively. A database of allele frequencies and SSR profiles of 32 reference cultivars are provided.
The problem of determining the coarsest partition stable with respect to a given binary relation, is known to be equivalent to the problem of finding the maximal bisimulation on a given structure. Such an equivalence has suggested efficient algorithms for the computation of the maximal bisimulation relation. In this paper the simulation problem is rewritten in terms of coarsest stable partition problem allowing a more algebraic understanding of the simulation equivalence. On this ground, a new algorithm for deciding simulation is proposed. Such a procedure improves on either space or time complexity of previous simulation algorithms.
Indexing strings via prefix (or suffix) sorting is, arguably, one of the most successful algorithmic techniques developed in the last decades. Can indexing be extended to languages? The main contribution of this paper is to initiate the study of the sub-class of regular languages accepted by an automaton whose states can be prefix-sorted. Starting from the recent notion of Wheeler graph [Gagie et al., TCS 2017]-which extends naturally the concept of prefix sorting to labeled graphs-we investigate the properties of Wheeler languages, that is, regular languages admitting an accepting Wheeler finite automaton. Interestingly, we characterize this family as the natural extension of regular languages endowed with the co-lexicographic ordering: when sorted, the strings belonging to a Wheeler language are partitioned into a finite number of co-lexicographic intervals, each formed by elements from a single Myhill-Nerode equivalence class. We proceed by proving several results related to Wheeler automata: (i) We show that every Wheeler NFA (WNFA) with n states admits an equivalent Wheeler DFA (WDFA) with at most 2n − 1 − |Σ| states (Σ being the alphabet) that can be computed in O(n 3 ) time. This is in sharp contrast with general NFAs (where the blow-up could be exponential). (ii) We describe a quadratic algorithm to prefix-sort a proper superset of the WDFAs, a O(n log n)time online algorithm to sort acyclic WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. By contribution (i), our algorithms can also be used to index any WNFA at the moderate price of doubling the automaton's size. (iii) We provide a minimization theorem that characterizes the smallest WDFA recognizing the same language of any input WDFA. The corresponding constructive algorithm runs in optimal linear time in the acyclic case, and in O(n log n) time in the general case. (iv) We show how to compute the smallest WDFA equivalent to any acyclic DFA in nearly-optimal time. Our contributions imply new results of independent interest. Contributions (i-iii) extend the universe of known regular languages for which membership can be tested efficiently [Backurs and Indyk, FOCS 2016] and provide a new class of NFAs for which the minimization problem can be approximated within constant factor in polynomial time. In general, the NFA minimization problem does not admit a polynomial-time o(n)-approximation unless P=PSPACE. Contribution (iv) is a big step towards a complete solution to the well-studied problem of indexing graphs for linear-time pattern matching queries: our algorithm provides a provably minimum-size solution for the deterministic-acyclic case.We wish to thank Travis Gagie for introducing us to the problem and for stimulating discussions. Corresponding author. Supported by the project MIUR-SIR CMACBioSeq ("Combinatorial methods for analysis and compression of biological sequences") grant n. RBSI146R5L.
A central claim of computational systems biology is that, by drawing on mathematical approaches developed in the context of dynamic systems, kinetic analysis, computational theory and logic, it is possible to create powerful simulation, analysis, and reasoning tools for working biologists to decipher existing data, devise new experiments, and ultimately to understand functional properties of genomes, proteomes, cells, organs, and organisms. In this article, a novel computational tool is described that achieves many of the goals of this new discipline. The novelty of this system involves an automaton-based semantics of the temporal evolution of complex biochemical reactions starting from the representation given as a set of differential equations. The related tools also provide ability to qualitatively reason about the systems using a propositional temporal logic that can express an ordered sequence of events succinctly and unambiguously. The implementation of mathematical and computational models in the Simpathica and XSSYS systems is described briefly. Several example applications of these systems to cellular and biochemical processes are presented: the two most prominent are Leibler et al.'s repressilator (an artificial synthesized oscillatory network), and Curto- Voit-Sorribas-Cascante's purine metabolism reaction model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.