T-DNA insertion mutants are very valuable for reverse genetics in Arabidopsis thaliana. Several projects have generated large sequence-indexed collections of T-DNA insertion lines, of which GABI-Kat is the second largest resource worldwide. User access to the collection and its Flanking Sequence Tags (FSTs) is provided by the front end SimpleSearch (http://www.GABI-Kat.de). Several significant improvements have been implemented recently. The database now relies on the TAIRv10 genome sequence and annotation dataset. All FSTs have been newly mapped using an optimized procedure that leads to improved accuracy of insertion site predictions. A fraction of the collection with weak FST yield was re-analysed by generating new FSTs. Along with newly found predictions for older sequences about 20 000 new FSTs were included in the database. Information about groups of FSTs pointing to the same insertion site that is found in several lines but is real only in a single line are included, and many problematic FST-to-line links have been corrected using new wet-lab data. SimpleSearch currently contains data from ∼71 000 lines with predicted insertions covering 62.5% of the 27 206 nuclear protein coding genes, and offers insertion allele-specific data from 9545 confirmed lines that are available from the Nottingham Arabidopsis Stock Centre.
Transformation by Agrobacterium tumefaciens, an important tool in modern plant research, involves the integration of T-DNA initially present on a plasmid in agrobacteria into the genome of plant cells. The process of attachment of the agrobacteria to plant cells and the transport of T-DNA into the cell and further to the nucleus has been well described. However, the exact mechanism of integration into the host's DNA is still unclear, although several models have been proposed. During confirmation of T-DNA insertion alleles from the GABI-Kat collection of Arabidopsis thaliana mutants, we have generated about 34,000 sequences from the junctions between inserted T-DNA and adjacent genome regions. Here, we describe the evaluation of this dataset with regard to existing models for T-DNA integration. The results suggest that integration into the plant genome is mainly mediated by the endogenous plant DNA repair machinery. The observed integration events showed characteristics highly similar to those of repair sites of double-strand breaks with respect to microhomology and deletion sizes. In addition, we describe unexpected integration events, such as large deletions and inversions at the integration site that are relevant for correct interpretation of results from T-DNA insertion mutants in reverse genetics experiments.
Microscopic organisms are the dominant and most diverse organisms on Earth. Nematodes, as part of this microscopic diversity, are by far the most abundant animals and their diversity is equally high. Molecular metabarcoding is often applied to study the diversity of microorganisms, but has yet to become the standard to determine nematode communities. As such, the information metabarcoding provides, such as in terms of species coverage, taxonomic resolution and especially if sequence reads can be linked to the abundance or biomass of nematodes in a sample, has yet to be determined. Here, we applied metabarcoding using three primer sets located within ribosomal rRNA gene regions to target assembled mock-communities consisting of 18 different nematode species that we established in 9 different compositions. We determined abundances and biomass of all species added to examine if relative sequence abundance or biomass can be linked to relative sequence reads. We found that nematode communities are not equally represented by the three different primer sets and we found that relative read abundances almost perfectly correlated positively with relative species biomass for two of the primer sets. This strong biomass-read number correlation suggests that metabarcoding reads can reveal biomass information even amongst more complex nematode communities as present in the environment and possibly can be transferred to better study other groups of organisms. This biomass-read link is of particular importance for more reliably assessing nutrient flow through food-webs, as well as adjusting biogeochemical models through user-friendly and easily obtainable metabarcoding data.
BackgroundMore than 90% of the Arabidopsis thaliana genes are members of multigene families. DNA sequence similarities present in such related genes can cause trouble, e.g. when molecularly analysing mutant alleles of these genes. Also, flanking-sequence-tag (FST) based predictions of T-DNA insertion positions are often located within paralogous regions of the genome. In such cases, the prediction of the correct insertion site must include careful sequence analyses on the one hand and a paralog specific primer design for experimental confirmation of the prediction on the other hand.ResultsGABI-Kat is a large A. thaliana insertion line resource, which uses in-house confirmation to provide highly reliable access to T-DNA insertion alleles. To offer trustworthy mutant alleles of paralogous loci, we considered multiple insertion site predictions for single FSTs and implemented this 1-to-N relation in our database. The resulting paralogous predictions were addressed experimentally and the correct insertion locus was identified in most cases, including cases in which there were multiple predictions with identical prediction scores. A newly developed primer design tool that takes paralogous regions into account was developed to streamline the confirmation process for paralogs. The tool is suitable for all parts of the genome and is freely available at the GABI-Kat website. Although the tool was initially designed for the analysis of T-DNA insertion mutants, it can be used for any experiment that requires locus-specific primers for the A. thaliana genome. It is easy to use and also able to design amplimers with two genome-specific primers as required for genotyping segregating families of insertion mutants when looking for homozygous offspring.ConclusionsThe paralog-aware confirmation process significantly improved the reliability of the insertion site assignment when paralogous regions of the genome were affected. An automatic online primer design tool that incorporates experience from the in-house confirmation of T-DNA insertion lines has been made available. It provides easy access to primers for the analysis of T-DNA insertion alleles, but it is also beneficial for other applications as well.
Background Experimental prove of gene function assignments in plants is heavily based on mutant analyses. T-DNA insertion lines provided an invaluable resource of mutants and enabled systematic reverse genetics-based investigation of the functions of Arabidopsis thaliana genes during the last decades. Results We sequenced the genomes of 14 A. thaliana GABI-Kat T-DNA insertion lines, which eluded flanking sequence tag-based attempts to characterize their insertion loci, with Oxford Nanopore Technologies (ONT) long reads. Complex T-DNA insertions were resolved and 11 previously unknown T-DNA loci identified, suggesting that the number of T-DNA insertions per line was underestimated. T-DNA mutagenesis caused fusions of chromosomes along with compensating translocations to keep the gene set complete throughout meiosis. Also, an inverted duplication of 800 kbp was detected. About 10% of GABI-Kat lines might be affected by chromosomal rearrangements, some of which do not involve T-DNA. Local assembly of selected reads was shown to be a computationally effective method to resolve the structure of T-DNA insertion loci. We developed an automated workflow to support investigation of long read data from T-DNA insertion lines. All steps from DNA extraction to assembly of T-DNA loci can be completed within days. Conclusion Long read sequencing was demonstrated to be a very effective way to resolve complex T-DNA insertions and chromosome fusions. Many T-DNA insertions comprise not just a single T-DNA, but complex arrays of multiple T-DNAs. It is becoming obvious that T-DNA insertion alleles must be characterized by exact identification of both T-DNA::genome junctions to generate clear genotype-to-phenotype relations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.