Background: The problem of finding the shortest absent words in DNA data has been recently addressed, and algorithms for its solution have been described. It has been noted that longer absent words might also be of interest, but the existing algorithms only provide generic absent words by trivially extending the shortest ones.
Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we explore different sets of minimal absent words in the genomes of 22 organisms (one archaeota, thirteen bacteria and eight eukaryotes). We investigate if the mutational biases that may explain the deficit of the shortest absent words in vertebrates are also pervasive in other absent words, namely in minimal absent words, as well as to other organisms. We find that the compositional biases observed for the shortest absent words in vertebrates are not uniform throughout different sets of minimal absent words. We further investigate the hypothesis of the inheritance of minimal absent words through common ancestry from the similarity in dinucleotide relative abundances of different sets of minimal absent words, and find that this inheritance may be exclusive to vertebrates.
Given the wide range of methodologies employed, it is not possible to recommend the most appropriate for assessing MCo. Researchers should adopt recognized standards in future work. This is needed before consensus about the role that MCo plays in gait impairment in neurological diseases and its potential as a target for gait rehabilitation can be determined.
Huanglongbing (HLB), the most important citrus disease worldwide, is associated with bacteria transmitted by the Asian citrus psyllid (ACP) preferably through new shoots present in the canopy. In a commercial citrus plant, the vegetative growth of the scion is influenced by the rootstock variety in which it is grafted. Although all commercial citrus varieties planted in recent years are susceptible to HLB, the dynamics of the rootstock in grafted plant could influence the progress of HLB, whether at the plant or grove scale. In this work, HLB incidence in 'Valencia' sweet orange grafted onto 16 rootstocks and its relationship to the tree canopy volume and flushing dynamics were evaluated in a field trial under ACP control. The experiment was conducted under rainfed conditions in Bebedouro, state of São Paulo, Brazil, from 2011 to 2019. 'Flying Dragon' trifoliate orange known for its dwarfing characteristics was used as the rootstock. A reduction in canopy volume by 77% at 8 years of age were observed compared to the most vigorous rootstocks. The frequency of flush shoots of 'Valencia' sweet orange was not influenced by the rootstock, but the abundance of flush shoots was lower on three semi-dwarfing rootstocks and as well as 'Flying Dragon'. Although HLB incidence on 'Flying Dragon' was lower than on 'Rangpur' lime and other three semi-standard rootstocks (trees with canopy volume between 51 and 75% of the 'Rangpur' lime canopy volume), all other combinations had similar HLB disease progress regardless of the canopy volume and flushing dynamics. Moreover, under field conditions, variations on the cumulative HLB incidence greater than 26% were necessary to significantly separate rootstocks. Therefore, the results suggest that true dwarfing rootstocks have potential to integrate the management program for HLB and that mechanisms in addition to tree vigor appear to be involved in the host-vector relationship.
BackgroundThe emerging next-generation sequencing (NGS) is bringing, besides the natural huge amounts of data, an avalanche of new specialized tools (for analysis, compression, alignment, among others) and large public and private network infrastructures. Therefore, a direct necessity of specific simulation tools for testing and benchmarking is rising, such as a flexible and portable FASTQ read simulator, without the need of a reference sequence, yet correctly prepared for producing approximately the same characteristics as real data.FindingsWe present XS, a skilled FASTQ read simulation tool, flexible, portable (does not need a reference sequence) and tunable in terms of sequence complexity. It has several running modes, depending on the time and memory available, and is aimed at testing computing infrastructures, namely cloud computing of large-scale projects, and testing FASTQ compression algorithms. Moreover, XS offers the possibility of simulating the three main FASTQ components individually (headers, DNA sequences and quality-scores).ConclusionsXS provides an efficient and convenient method for fast simulation of FASTQ files, such as those from Ion Torrent (currently uncovered by other simulators), Roche-454, Illumina and ABI-SOLiD sequencing machines. This tool is publicly available at http://bioinformatics.ua.pt/software/xs/.
The general approaches to detect and quantify metagenomic sample composition are based on the alignment of the reads, according to an existing database containing reference microbial sequences. However, without proper parameterization, these methods are not suitable for ancient DNA. Quantifying somewhat dissimilar sequences by alignment methods is problematic, due to the need of fine-tuned thresholds, considering relaxed edit distances and the consequent increase of computational cost. Additionally, the choice of the thresholds poses the problem of how to quantify similarity without producing overestimated measures. We propose FALCON-meta, a compression-based method to infer metagenomic composition of next-generation sequencing samples. This unsupervised alignment-free method runs efficiently on FASTQ samples. FALCON-meta quickly learns how to give importance to the models that cooperate to predict similarity, incorporating parallelism and flexibility for multiple hardware characteristics. It shows substantial identification capabilities in ancient DNA without overestimation. In one of the examples, we found and authenticated an ancient Pseudomonas bacteria in a Mammoth mitogenome.FALCON-meta can be accessed at https://github.com/pratas/falcon.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.