2018
DOI: 10.1101/329045
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species

Abstract: Assembly of bacterial short-read whole genome sequencing (WGS) data frequently results in hundreds of contigs for which the origin, plasmid or chromosome, is unclear. Long-read sequencing has emerged as a solution to resolve plasmid structures and to obtain complete genomes for most bacterial species. This information can be used to generate and label datasets from short-read based contigs as plasmid-or chromosome-derived. We investigated the use of several popular machine learning methods to classify short-re… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
73
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 50 publications
(75 citation statements)
references
References 31 publications
(32 reference statements)
1
73
0
Order By: Relevance
“…Detailed description of Illumina and ONT sequencing is available at Supplementary Methods S1-S5 and in Arredondo-Alonso et al (Arredondo-Alonso et al 2018) which includes a full description on ONT selection of E. faecium isolates (n = 62) and consecutive hybrid assembly using Unicycler (Wick et al 2017).…”
Section: Genomic Dna Sequencing and Assemblymentioning
confidence: 99%
See 2 more Smart Citations
“…Detailed description of Illumina and ONT sequencing is available at Supplementary Methods S1-S5 and in Arredondo-Alonso et al (Arredondo-Alonso et al 2018) which includes a full description on ONT selection of E. faecium isolates (n = 62) and consecutive hybrid assembly using Unicycler (Wick et al 2017).…”
Section: Genomic Dna Sequencing and Assemblymentioning
confidence: 99%
“…To determine the plasmidome content of the remaining 1,582 isolates, we used mlplasmids which provides a support-vector machine classifier which was trained and tested using complete genome sequences from E. faecium (Arredondo-Alonso et al 2018). In short, short-read contigs derived from our completed genomes were mapped against the finished chromosomes and plasmids to obtain a short-read contigs dataset labeled either as chromosome-or plasmid-derived.…”
Section: Characterization Of Fully Assembled Plasmidsmentioning
confidence: 99%
See 1 more Smart Citation
“…To tackle this issue, many new bioinformatic tools have recently been developed, following different approaches: (i) Recycler and plasmidSPAdes [15,16] exploit coverage variations of sequenced DNA fragments within a genome; (ii) PLACNET investigates paired-end reads linking contig ends [17] ; (iii) PlasmidFinder searches for plasmid specific motifs, i.e. incompatibility groups [18] ; (iv) cBar, PlasFlow and mlPlasmids use machine learning methods to classify k-mer frequencies [19][20][21] ;…”
Section: Introductionmentioning
confidence: 99%
“…Due to increasing amounts of generated sequence data [24] , there is a rising need for automated high-throughput analysis tools. Unfortunately, not all currently available bioinformatics software tools are suitable for high-throughput analysis, let alone the technical integration into larger analysis pipelines [25][26][27] due to interactive designs or web-based implementations [17,18,21,28] . Taxon-specific database designs also pose additional barriers as users might not have sufficient computational resources or bioinformatics support to build customized or large multi-taxon databases [20,22] .…”
Section: Introductionmentioning
confidence: 99%