SonicParanoid: fast, accurate and easy orthology inference

Cosentino, Salvatore

doi:10.1093/bioinformatics/bty631

Cited by 122 publications

(129 citation statements)

References 20 publications

Supporting

Mentioning

128

Contrasting

Order By: Relevance

“…In these tests, the runtimes of Broccoli were found between those of Sonicparanoid and OrthoFinder2 (Figure 4). Regarding the two extremes of the speed spectrum, OrthoFinder2 with the MSA option was by far the slowest pipeline, and Sonicparanoid, which only performs half of similarity searches (Cosentino and Iwasaki 2019), was found to be the fastest for every dataset. The same speed rank was observed when analysing the QfO 2018 dataset, which contains 78 species, using 8 CPUs: Sonicparanoid (522 minutes), Broccoli (634 minutes) and OrthoFinder2 (850 minutes; we did not test the MSA option using this dataset).…”

Section: Running Time Analysesmentioning

confidence: 99%

“…Current de novo clustering algorithms are all based on the analysis of pairwise protein distances. Two main approaches have been proposed: distances can be analysed (i) using the best bi-directional hits (BBH) approach or one of its derivative to infer orthologous pairs as implemented in Hieranoid or OMA (Huynen and Bork 1998;Roth, et al 2008;Schreiber and Sonnhammer 2013;Sonnhammer and Ostlund 2015;Cosentino and Iwasaki 2019), or (ii) using the Markov Cluster algorithm (MCL) to infer orthologous groups from the network of similarities (Dongen 2000;Li, et al 2003;Emms and Kelly 2015), orthologous groups that can further be analysed using phylogenetic analyses and a species tree reconciliation approach to infer orthologous pairs (Emms and Kelly 2019). The BBH approach is highly precise but is inclined to miss orthologous pairs due to its highly constrained nature (Dalquen and Dessimoz 2013).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Broccoli: combining phylogenetic and network analyses for orthology assignment

Derelle

Piégay

Colbourne

2019

Preprint

View full text Add to dashboard Cite

Orthology assignment is a key step of comparative genomic studies, for which many bioinformatic tools have been developed. However, all gene clustering pipelines are based on the analysis of protein distances, which are subject to many artefacts. In this paper we introduce Broccoli, a user-friendly pipeline designed to infer, with high precision, orthologous groups and pairs of proteins using a phylogeny-based approach. Briefly, Broccoli performs ultra-fast phylogenetic analyses on most proteins and builds a network of orthologous relationships. Orthologous groups are then identified from the network using a parameter-free machine learning algorithm. Broccoli is also able to detect chimeric proteins resulting from gene-fusion events and to assign these proteins to the corresponding orthologous groups. Tested on two benchmark datasets, Broccoli outperforms current orthology pipelines. In addition, Broccoli is scalable, with runtimes similar to those of recent distance-based pipelines. Given its high level of performance and efficiency, this new pipeline represents a suitable choice for comparative genomic studies. Broccoli is freely available at https://github.com/rderelle/Broccoli.

show abstract

Section: Running Time Analysesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Broccoli: combining phylogenetic and network analyses for orthology assignment

Derelle

Piégay

Colbourne

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Phylogenetic relationships between two subgroups of L. buchneri and their related species were inferred based on conserved single-copy genes as follows. Protein-coding genes for each genome were (re-)annotated using dfast (version 1.2.2) [35], then clustered into orthologous groups using SonicParanoid (version 1.0) [36]. Each of the identified 964 single-copy orthologues were aligned using muscle (ver 3.8.31) [37], followed by elimination of poorly aligned positions and divergent regions by Gblocks (version 0.91b) [38].…”

mentioning

confidence: 99%

Lactobacillus buchneri subsp. silagei subsp. nov., isolated from rice grain silage

Tanizawa

Kobayashi

Nomura

et al. 2020

International Journal of Systematic and Evolutionary Microbiology

View full text Add to dashboard Cite

Two Gram-stain-positive, rod-shaped, non-motile, non-spore-forming, catalase-negative bacteria, designated strains SG162T and NK01, were isolated from Japanese rice grain silage and total mixed ration silage, respectively. They were initially identified as Lactobacillus buchneri based on the 16S rRNA gene sequence similarities. However, the two strains were separated into a distinct clade from L. buchneri DSM 20057T (=JCM 1115T) through whole-genome sequence-based characterization, forming an infraspecific subgroup together with strains CD034 and S42, whose genomic sequences were available in the public sequence database. Strains within the subgroup shared 99.4–99.7 % average nucleotide identity (ANI) and 97.5–99.0 % digital DNA–DNA hybridization (dDDH) with each other, albeit 96.9–97.0 % ANI and 76.0–76.6 % dDDH against DSM 20057T. Strains SG162T and NK01 could utilize more substrates as sole carbon sources than DSM 20057T, potentially owing to the abundance of genes involved in carbon metabolism, especially the Entner–Doudoroff pathway. The inability of γ-aminobutyric acid (GABA) production was evidenced by the lack of glutamate decarboxylase and glutamate/GABA antiporter genes in the new subgroup strains. Strain SG162T grew at 10–45 °C (optimum, 30 °C), pH 3.5–8.0, and 0–8 % (w/v) NaCl. Its genomic DNA G+C content was 44.1 mol%. The predominant fatty acids were C16 : 0, C19 : 0 cyclo ω8c, and summed feature 8. On the basis of the polyphasic characterization findings, strains SG162T and NK01 represent a novel subspecies of L. buchneri , for which the name Lactobacillus buchneri subsp. silagei subsp. nov. is proposed. The type strain is SG162T (=JCM 32599T=DSM 107969T), and strains CD034 and S42 are also transferred to L. buchneri subsp. silagei.

show abstract

“…QfO service 205 evaluates the predictive quality by performing four phylogeny-based tests of Species Gene Ontology conservation test and Enzyme Classification conservation test[24]. We also applied 210 two more orthology prediction tools, SonicParanoid[47] and InParanoid (v4.1)[4],211 on the QfO 2011 set and used their results as control. The pairwise orthology rela-212 tionships were extracted from the predicted orthologous groups of all the tools, in-213 cluding SonicParanoid and InParanoid, and then submitted to the QfO web-service The homology search results show that BLASTP detected the largest number 228 of homologs (947,203,546).…”

mentioning

confidence: 99%

“…A 43 comparison of several methods that include both tree-based and graph-based meth- 44 ods found that tree-based methods had even a worse performance than graph-based 45 methods on large dataset [10]. One study compared several common methods in- 46 cluding RBH, graph-based and tree-based and found that tree-based methods often 47 give a higher specificity but lower sensitivity [20]. Several studies have also shown 48 that graph-based methods find a better trade-off between specificity and sensitiv- 49 ity than tree-based methods [10,20,21].…”

mentioning

confidence: 99%

SwiftOrtho: a Fast, Memory-Efficient, Multiple Genome Orthology Classifier

Friedberg

2019

Preprint

View full text Add to dashboard Cite

Introduction: Gene homology type classification is a requisite for many types of genome analyses, including comparative genomics, phylogenetics, and protein function annotation. A large variety of tools have been developed to perform homology classification across genomes of different species. However, when applied to large genomic datasets, these tools require high memory and CPU usage, typically available only in costly computational clusters. To address this problem, we developed a new graph-based orthology analysis tool, SwiftOrtho, which is optimized for speed and memory usage when applied to large-scale data. Results: In our tests, SwiftOrtho is the only tool that completed orthology analysis of 1,760 bacterial genomes on a computer with only 4GB RAM. Using various standard orthology datasets, we also show that SwiftOrtho has a high accuracy. SwiftOrtho enables the accurate comparative genomic analyses of thousands of genomes using low memory computers. Availability: https://github.com/Rinoahu/SwiftOrtho Background 1 Gene homology type classification consists of identifying paralogs and orthologs 2 across species. Orthologs are genes that evolved from a common ancestral gene fol-3 lowing speciation, while paralogs are genes that are homologous due to duplication. 4Computationally detecting orthologs and paralogs across species is an important 5 problem, as the evolutionary history of genes has implications for our understand-6 ing of gene function and evolution.7 While the proper inference of homology type involves tracing gene history using 8 phylogenetic trees [1], several proxy methods have been developed over the years.9The most common method to infer orthologs by proxy is Reciprocal Best Hit or 10 RBH [2, 3]. Briefly, RBH states the following: when two proteins that are encoded they are considered to be orthologs [2,3]. 13Inparanoid extends the RBH orthology relationship to include both orthologs and 14 in-paralogs [4][5][6]. Specifically, Inparanoid distinguishes between orthologs and in-15 paralogs, which were duplicated following a given speciation event [4][5][6]. It is then 16 a matter of course to extend orthologous pairs between two species to an ortholog 17 group, where an ortholog group is defined as a set of genes that are hypothesized to 18 have descended from a common ancestor [6]. Several methods have been developed 19 to identify ortholog groups across multiple species. These methods can be classi-20 fied into two types: tree-based and graph-based. Tree-based methods construct a 21 gene tree from an alignment of homologous sequences in different species and infer 22 orthology relationships by reconciling the gene tree with its corresponding species 23 tree [1,7,8]. Tree-based methods can infer a correct orthology relationship if the 24 correct gene tree and species tree are given [9]. The main limitation of tree-based 25 methods is the accuracy of the given gene tree and species tree. Erroneous trees 26 lead to incorrect ortholog and in-paralog assignments [8][9][10]. Tree-based methods 2...

show abstract

SonicParanoid: fast, accurate and easy orthology inference

Abstract: Supplementary data are available at Bioinformatics online.

Cited by 122 publications

References 20 publications

Broccoli: combining phylogenetic and network analyses for orthology assignment

Broccoli: combining phylogenetic and network analyses for orthology assignment

Lactobacillus buchneri subsp. silagei subsp. nov., isolated from rice grain silage

SwiftOrtho: a Fast, Memory-Efficient, Multiple Genome Orthology Classifier

Contact Info

Product

Resources

About