Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.Evolutionarily related genes (homologs) across different species are often divided into gene pairs that originated through speciation events (orthologs) and gene pairs that originated through duplication events (paralogs) 1 . This distinction is useful in a broad range of contexts, including phylogenetic tree inference, genome annotation, comparative genomics and gene function prediction 2-4 . Accordingly, dozens of methods 5 and resources 6-8 for orthology inference have been developed.Because the true evolutionary history of genes is typically unknown, assessing the performance of these orthology inference methods is not straightforward. Several indirect approaches have been proposed. Based on the notion that orthologs tend to be functionally more similar than paralogs (a notion now referred to as the ortholog conjecture 9-12 ), Hulsen et al. 13 used several measures of functional conservation (coexpression levels, protein-protein interactions and protein domain conservation) to benchmark orthology inference methods. Chen et al. 14 proposed an unsupervised learning approach based on consensus among different orthology methods. Altenhoff and Dessimoz 15 introduced a phylogenetic benchmark measuring the concordance between gene trees reconstructed from putative orthologs and undisputed species trees. More recently, several 'gold standard' reference sets, either manually curated 16,17 or derived from trusted resources 18 , have been used as benchmarks. Finally, Dalquen et al. 19 used simulated genomes to assess orthology inference in the presence of varying amounts of duplication, lateral gene transfer and sequencing artifacts.This wide array of benchmarking approaches poses considerable challenges to developers and users of orthology methods. Conceptually, the choice of an appropriate benchmark strongly depends on the application at hand. Practically, most methods are not available as stand-alone programs and thus cannot easily be
Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of the overall family, multiple alignments can be used to identify conserved features and to highlight differences or specificities. In this paper, we describe a comprehensive evaluation of many of the most popular methods for multiple sequence alignment (MSA), based on a new benchmark test set. The benchmark is designed to represent typical problems encountered when aligning the large protein sequence sets that result from today's high throughput biotechnologies. We show that alignmentmethods have significantly progressed and can now identify most of the shared sequence features that determine the broad molecular function(s) of a protein family, even for divergent sequences. However,we have identified a number of important challenges. First, the locally conserved regions, that reflect functional specificities or that modulate a protein's function in a given cellular context,are less well aligned. Second, motifs in natively disordered regions are often misaligned. Third, the badly predicted or fragmentary protein sequences, which make up a large proportion of today's databases, lead to a significant number of alignment errors. Based on this study, we demonstrate that the existing MSA methods can be exploited in combination to improve alignment accuracy, although novel approaches will still be needed to fully explore the most difficult regions. We then propose knowledge-enabled, dynamic solutions that will hopefully pave the way to enhanced alignment construction and exploitation in future evolutionary systems biology studies.
DNA methods are useful to identify ingested prey items from the gut of predators, but reliable detection is hampered by low amounts of degraded DNA. PCR-based methods can retrieve minute amounts of starting material but suffer from amplification biases and cross-reactions with the predator and related species genomes. Here, we use PCR-free direct shotgun sequencing of total DNA isolated from the gut of the harlequin ladybird Harmonia axyridis at five time points after feeding on a single pea aphid Acyrthosiphon pisum. Sequence reads were matched to three reference databases: Insecta mitogenomes of 587 species, including H. axyridis sequenced here; A. pisum nuclear genome scaffolds; and scaffolds and complete genomes of 13 potential bacterial symbionts. Immediately after feeding, multicopy mtDNA of A. pisum was detected in tens of reads, while hundreds of matches to nuclear scaffolds were detected. Aphid nuclear DNA and mtDNA decayed at similar rates (0.281 and 0.11 h(-1) respectively), and the detectability periods were 32.7 and 23.1 h. Metagenomic sequencing also revealed thousands of reads of the obligate Buchnera aphidicola and facultative Regiella insecticola aphid symbionts, which showed exponential decay rates significantly faster than aphid DNA (0.694 and 0.80 h(-1) , respectively). However, the facultative aphid symbionts Hamiltonella defensa, Arsenophonus spp. and Serratia symbiotica showed an unexpected temporary increase in population size by 1-2 orders of magnitude in the predator guts before declining. Metagenomics is a powerful tool that can reveal complex relationships and the dynamics of interactions among predators, prey and their symbionts.
Field-collected specimens of invertebrates are regularly killed and preserved in ethanol, prior to DNA extraction from the specimens, while the ethanol fraction is usually discarded. However, DNA may be released from the specimens into the ethanol, which can potentially be exploited to study species diversity in the sample without the need for DNA extraction from tissue. We used shallow shotgun sequencing of the total DNA to characterize the preservative ethanol from two pools of insects (from a freshwater habitat and terrestrial habitat) to evaluate the efficiency of DNA transfer from the specimens to the ethanol. In parallel, the specimens themselves were subjected to bulk DNA extraction and shotgun sequencing, followed by assembly of mitochondrial genomes for 39 of 40 species in the two pools. Shotgun sequencing from the ethanol fraction and read-matching to the mitogenomes detected ~40% of the arthropod species in the ethanol, confirming the transfer of DNA whose quantity was correlated to the biomass of specimens. The comparison of diversity profiles of microbiota in specimen and ethanol samples showed that 'closed association' (internal tissue) bacterial species tend to be more abundant in DNA extracted from the specimens, while 'open association' symbionts were enriched in the preservative fluid. The vomiting reflex of many insects also ensures that gut content is released into the ethanol, which provides easy access to DNA from prey items. Shotgun sequencing of DNA from preservative ethanol provides novel opportunities for characterizing the functional or ecological components of an ecosystem and their trophic interactions.
Characterizing trophic networks is fundamental to many questions in ecology, but this typically requires painstaking efforts, especially to identify the diet of small generalist predators. Several attempts have been devoted to develop suitable molecular tools to determine predatory trophic interactions through gut content analysis, and the challenge has been to achieve simultaneously high taxonomic breadth and resolution. General and practical methods are still needed, preferably independent of PCR amplification of barcodes, to recover a broader range of interactions. Here we applied shotgun-sequencing of the DNA from arthropod predator gut contents, extracted from four common coccinellid and dermapteran predators co-occurring in an agroecosystem in Brazil. By matching unassembled reads against six DNA reference databases obtained from public databases and newly assembled mitogenomes, and filtering for high overlap length and identity, we identified prey and other foreign DNA in the predator guts. Good taxonomic breadth and resolution was achieved (93% of prey identified to species or genus), but with low recovery of matching reads. Two to nine trophic interactions were found for these predators, some of which were only inferred by the presence of parasitoids and components of the microbiome known to be associated with aphid prey. Intraguild predation was also found, including among closely related ladybird species. Uncertainty arises from the lack of comprehensive reference databases and reliance on low numbers of matching reads accentuating the risk of false positives. We discuss caveats and some future prospects that could improve the use of direct DNA shotgun-sequencing to characterize arthropod trophic networks.
BackgroundThe accurate determination of orthology and inparalogy relationships is essential for comparative sequence analysis, functional gene annotation and evolutionary studies. Various methods have been developed based on either simple blast all-versus-all pairwise comparisons and/or time-consuming phylogenetic tree analyses.ResultsWe have developed OrthoInspector, a new software system incorporating an original algorithm for the rapid detection of orthology and inparalogy relations between different species. In comparisons with existing methods, OrthoInspector improves detection sensitivity, with a minimal loss of specificity. In addition, several visualization tools have been developed to facilitate in-depth studies based on these predictions. The software has been used to study the orthology/in-paralogy relationships for a large set of 940,855 protein sequences from 59 different eukaryotic species.ConclusionOrthoInspector is a new software system for orthology/paralogy analysis. It is made available as an independent software suite that can be downloaded and installed for local use. Command line querying facilitates the integration of the software in high throughput processing pipelines and a graphical interface provides easy, intuitive access to results for the non-expert.
OrthoInspector is one of the leading software suites for orthology relations inference. In this paper, we describe a major redesign of the OrthoInspector online resource along with a significant increase in the number of species: 4753 organisms are now covered across the three domains of life, making OrthoInspector the most exhaustive orthology resource to date in terms of covered species (excluding viruses). The new website integrates original data exploration and visualization tools in an ergonomic interface. Distributions of protein orthologs are represented by heatmaps summarizing their evolutionary histories, and proteins with similar profiles can be directly accessed. Two novel tools have been implemented for comparative genomics: a phylogenetic profile search that can be used to find proteins with a specific presence-absence profile and investigate their functions and, inversely, a GO profiling tool aimed at deciphering evolutionary histories of molecular functions, processes or cell components. In addition to the re-designed website, the OrthoInspector resource now provides a REST interface for programmatic access. OrthoInspector 3.0 is available at http://lbgi.fr/orthoinspectorv3.
High-throughput DNA methods hold great promise for phylogenetic analysis of lineages that are difficult to study with conventional molecular and morphological approaches. The mites (Acari), and in particular the highly diverse soil-dwelling lineages, are among the least known branches of the metazoan Tree-of-Life. We extracted numerous minute mites from soils in an area of mixed forest and grassland in southern Iberia. Selected specimens representing the full morphological diversity were shotgun sequenced in bulk, followed by genome assembly of short reads from the mixture, which produced >100 mitochondrial genomes representing diverse acarine lineages. Phylogenetic analyses in combination with taxonomically limited mitogenomes available publicly resulted in plausible trees defining basal relationships of the Acari. Several critical nodes were supported by ancestral-state reconstructions of mitochondrial gene rearrangements. Molecular calibration placed the minimum age for the common ancestor of the superorder Acariformes, which includes most soil-dwelling mites, to the Cambrian–Ordovician (likely within 455–552 Ma), whereas the origin of the superorder Parasitiformes was placed later in the Carboniferous-Permian. Most family-level taxa within the Acariformes were dated to the Jurassic and Triassic. The ancient origin of Acariformes and the early diversification of major extant lineages linked to the soil are consistent with a pioneering role for mites in building the earliest terrestrial ecosystems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.