Given an undirected graph with nonnegative costs on the edges, the routing cost of any of its spanning trees is the sum over all pairs of vertices of the cost of the path between the pair in the tree. Finding a spanning tree of minimum routing cost is NP-hard, even when the costs obey the triangle inequality. We show t h a t the general case is in fact reducible to the metric case and present a polynomial-time approximation scheme valid for both versions of the problem. In particular, we show h o w to build a spanning tree of an n-vertex weighted graph with routing cost at most (1 + ) of the the minimum in time O(n O( 1 ) ). Besides the obvious connection to network design, trees with small routing cost also nd application in the construction of good multiple sequence alignments in computational biology.The communication cost spanning tree problem is a generalization of the minimum routing cost tree problem where the routing costs of di erent pairs are weighted by di erent requirement amounts. We observe that a randomized O(log n log log n)-approximation for this problem follows directly from a recent result of Bartal, where n is the number of nodes in a metric graph. This also yields the same approximation for the generalized sum-of-pairs alignment problem in computational biology.
We describe an algorithm for aligning two sequences within a diagonal band that requires only O(NW) computation time and O(N) space, where N is the length of the shorter of the two sequences and W is the width of the band. The basic algorithm can be used to calculate either local or global alignment scores. Local alignments are produced by finding the beginning and end of a best local alignment in the band, and then applying the global alignment algorithm between those points. This algorithm has been incorporated into the FASTA program package, where it has decreased the amount of memory required to calculate local alignments from O(NW) to O(N) and decreased the time required to calculate optimized scores for every sequence in a protein sequence database by 40%. On computers with limited memory, such as the IBM-PC, this improvement both allows longer sequences to be aligned and allows optimization within wider bands, which can include longer gaps.
ObjectiveTuberculosis (TB) remains the leading cause of death among infectious diseases worldwide. It has been suggested as an important risk factor of chronic obstructive pulmonary disease (COPD), which is also a major cause of morbidity and mortality. This study investigated the impact of pulmonary TB and anti-TB treatment on the risk of developing COPD.Design, Setting, and ParticipantsThis cohort study used the National Health Insurance Database of Taiwan, particularly the Longitudinal Health Insurance Database 2005 to obtain 3,176 pulmonary TB cases and 15,880 control subjects matched in age, sex, and timing of entering the database.Main Outcome MeasuresHazard ratios of potential risk factors of COPD, especially pulmonary TB and anti-TB treatment.ResultsThe mean age of pulmonary TB cases was 51.9±19.2. The interval between the initial study date and commencement of anti-TB treatment (delay in anti-TB treatment) was 75.8±65.4 days. Independent risk factors for developing COPD were age, male, low income, and history of pulmonary TB (hazard ratio 2.054 [1.768–2.387]), while diabetes mellitus was protective. The impact of TB persisted for six years after TB diagnosis and was significant in women and subjects aged >70 years. Among TB patients, delay in anti-TB treatment had a dose-response relationship with the risk of developing COPD.ConclusionsSome cases of COPD may be preventable by controlling the TB epidemic, early TB diagnosis, and prompt initiation of appropriate anti-TB treatment. Follow-up care and early intervention for COPD may be necessary for treated TB patients.
Abstract.We study two fundamental problems concerning the search for interesting regions in sequences: (i) given a sequence of real numbers of length n and an upper bound U , find a consecutive subsequence of length at most U with the maximum sum and (ii) given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. We present an O(n)-time algorithm for the first problem and an O(n log L)-time algorithm for the second. The algorithms have potential applications in several areas of biomolecular sequence analysis including locating GC-rich regions in a genomic DNA sequence, post-processing sequence alignments, annotating multiple sequence alignments, and computing length-constrained ungapped local alignment. Our preliminary tests on both simulated and real data demonstrate that the algorithms are very efficient and able to locate useful (such as GC-rich) regions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.