Deep learning is driving recent advances behind many everyday technologies, including speech and image recognition, natural language processing and autonomous driving. It is also gaining popularity in biology, where it has been used for automated species identification, environmental monitoring, ecological modelling, behavioural studies, DNA sequencing and population genetics and phylogenetics, among other applications. Deep learning relies on artificial neural networks for predictive modelling and excels at recognizing complex patterns. In this review we synthesize 818 studies using deep learning in the context of ecology and evolution to give a discipline‐wide perspective necessary to promote a rethinking of inference approaches in the field. We provide an introduction to machine learning and contrast it with mechanistic inference, followed by a gentle primer on deep learning. We review the applications of deep learning in ecology and evolution and discuss its limitations and efforts to overcome them. We also provide a practical primer for biologists interested in including deep learning in their toolkit and identify its possible future applications. We find that deep learning is being rapidly adopted in ecology and evolution, with 589 studies (64%) published since the beginning of 2019. Most use convolutional neural networks (496 studies) and supervised learning for image identification but also for tasks using molecular data, sounds, environmental data or video as input. More sophisticated uses of deep learning in biology are also beginning to appear. Operating within the machine learning paradigm, deep learning can be viewed as an alternative to mechanistic modelling. It has desirable properties of good performance and scaling with increasing complexity, while posing unique challenges such as sensitivity to bias in input data. We expect that rapid adoption of deep learning in ecology and evolution will continue, especially in automation of biodiversity monitoring and discovery and inference from genetic data. Increased use of unsupervised learning for discovery and visualization of clusters and gaps, simplification of multi‐step analysis pipelines, and integration of machine learning into graduate and postgraduate training are all likely in the near future.
Tong et al. comment on the accuracy of the dating analysis presented in our work on the phylogeny of insects and provide a reanalysis of our data. They replace log-normal priors with uniform priors and add a "roachoid" fossil as a calibration point. Although the reanalysis provides an interesting alternative viewpoint, we maintain that our choices were appropriate.
We present our current phylogenetic hypothesis on the phylogeny of Trichoptera, generated from an analysis of over 7000 nucleotides from 18S and 28S rRNA, EF-1α, COI, and CAD. We corroborate our earlier hypotheses, with results that include a monophyletic Annulipalpia, Integripalpia, Brevitentoria, and Plenitentoria. Monophyly of Psychomyioidea, Pseudoneureclipsidae, and Grumichellinae were confirmed. The "Spicipalpian" families were again found to be paraphyletic, and most closely related to Integripalpia. Ptilocolepidae was not found to be monophyletic, but support for its paraphyly was so weak that we interpret our results as unresolved. We interpret our measures of branch support, and present a collapsed phylogeny that more conservatively represents our current hypothesis. We discuss how these data can eventually be merged into other sources of data, such as COI barcode data and transcriptomes, and suggest that a single huge analysis of all data, with all taxa, is unnecessary if analyses can be phylogenetically subdivided into many separate parts, using transcriptome data to fix the deepest nodes, and allowing faster evolving data to be more appropriately targeted to nodes closer to the tips of the tree.
Butterflies are a diverse and charismatic insect group that are thought to have diversified via coevolution with plants and in response to dispersals following key geological events. These hypotheses have been poorly tested at the macroevolutionary scale because a comprehensive phylogenetic framework and datasets on global distributions and larval hosts of butterflies are lacking. We sequenced 391 genes from nearly 2,000 butterfly species to construct a new, phylogenomic tree of butterflies representing 92% of all genera and aggregated global distribution records and larval host datasets. We found that butterflies likely originated in what is now the Americas, ~100 Ma, shortly before the Cretaceous Thermal Maximum, then crossed Beringia and diversified in the Paleotropics. The ancestor of modern butterflies likely fed on Fabaceae, and most extant families were present before the K/Pg extinction. The majority of butterfly dispersals occurred from the tropics (especially the Neotropics) to temperate zones, largely supporting a "cradle" pattern of diversification. Surprisingly, host breadth changes and shifts to novel host plants had only modest impacts.
The first insect genome (Drosophila melanogaster) was published two decades ago. Today, nuclear genome assemblies are available for a staggering 601 different insects representing 20 orders. Here, we analyzed the best assembly for each insect and provide a "state of the field" perspective, emphasizing taxonomic representation, assembly quality, gene completeness, and sequencing technology. We show that while genomic efforts have been biased towards specific groups (e.g., Diptera), assemblies are generally contiguous with gene regions intact. Most notable, however, has been the impact of long-read sequencing; assemblies that incorporate long-reads are ~48x more contiguous than those that do not.
Trichoptera (caddisflies) play an essential role in freshwater ecosystems; for instance, larvae process organic material from the water and are food for a variety of predators. Knowledge on the genomic diversity of caddisflies can facilitate comparative and phylogenetic studies thereby allowing scientists to better understand the evolutionary history of caddisflies. While Trichoptera are the most diverse aquatic insect order, they remain poorly represented in terms of genomic resources. To date, all long-read based genomes have been sequenced from individuals in the retreat-making suborder, Annulipalpia, leaving ∼275 Ma of evolution without high-quality genomic resources. Here, we report the first long-read based de novo genome assemblies of two tube case-making Trichoptera from the suborder Integripalpia, Agrypnia vestita Walker and Hesperophylax magnus Banks. We find that these tube case-making caddisflies have genome sizes that are at least three-fold larger than those of currently sequenced annulipalpian genomes and that this pattern is at least partly driven by major expansion of repetitive elements. In H. magnus, long interspersed nuclear elements (LINEs) alone exceed the entire genome size of some annulipalpian counterparts suggesting that caddisflies have high potential as a model for understanding genome size evolution in diverse insect lineages.SignificanceThere is a lack of genomic resources for aquatic insects. So far, only three high-quality genomes have been assembled, all from individuals in the retreat-making suborder Annulipalpia. In this article, we report the first high-quality genomes of two case-making species from the suborder Integripalpia, which are essential for studying genomic diversity across this ecologically diverse insect order. Our research reveals larger genome sizes in the tube case-makers (suborder Integripalpia, infraorder Phryganides), accompanied by a disproportionate increase of repetitive DNA. This suggests that genome size is at least partly driven by a major expansion of repetitive elements. Our work shows that caddisflies have high potential as a model for understanding how genomic diversity might be linked to functional diversification and forms the basis for detailed studies on genome size evolution in caddisflies.Data depositionThis project has been deposited at NCBI under the Bioproject ID: PRJNA668166
Aquatic insects comprise 10% of all insect diversity, can be found on every continent except Antarctica, and are key components of freshwater ecosystems. Yet aquatic insect genome biology lags dramatically behind that of terrestrial insects. If genomic effort was spread evenly, one aquatic insect genome would be sequenced for every ~9 terrestrial insect genomes. Instead, ~24 terrestrial insect genomes have been sequenced for every aquatic insect genome. This discrepancy is even more dramatic if the quality of genomic resources is considered; for instance, while no aquatic insect genome has been assembled to the chromosome level, 29 terrestrial insect genomes spanning four orders have. We argue that a lack of aquatic insect genomes is not due to any underlying difficulty (e.g., small body sizes or unusually large genomes) yet it is severely hampering aquatic insect research at both fundamental and applied scales. By expanding the availability of aquatic insect genomes, we will gain key insight into insect diversification and empower future research for a globally important taxonomic group.
Hydroptilidae is an extremely diverse family within Trichoptera, containing over 2,600 known species, that displays a wide array of ecological, morphological, and habitat diversity. However, exploration into the evolutionary history of microcaddisflies based on current phylogenetic methods is mostly lacking. The purpose of this study is to provide a proof-of-concept that the use of molecular data, particularly targeted enrichment data, and statistically supported methods of analysis can result in the construction of a stable phylogenetic framework for the microcaddisflies. Here, a preliminary exploration of the hydroptilid phylogeny is presented using a combination of targeted enrichment data for ca. 300 nuclear protein-coding genes and legacy (Sanger-based) sequence data for the mitochondrial COI gene and partial sequence from the 28S rRNA gene.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.