Michael P. O’Brien scite author profile

Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software implementation is available at https://github.com/ spacegraphcats/spacegraphcats under the 3-Clause BSD License.

show abstract

A practical fpt algorithm for Flow Decomposition and transcript assembly

Kloster

Kuinke

O’Brien

et al. 2018

View full text Add to dashboard Cite

The Flow Decomposition problem, which asks for the smallest set of weighted paths that "covers" a flow on a DAG, has recently been used as an important computational step in transcript assembly. We prove the problem is in FPT when parameterized by the number of paths by giving a practical linear fpt algorithm. Further, we implement and engineer a Flow Decomposition solver based on this algorithm, and evaluate its performance on RNA-sequence data. Crucially, our solver finds exact solutions while achieving runtimes competitive with a state-of-the-art heuristic. Finally, we contextualize our design choices with two hardness results related to preprocessing and weight recovery. Specifically, k-Flow Decomposition does not admit polynomial kernels under standard complexity assumptions, and the related problem of assigning (known) weights to a given set of paths is NP-hard.

show abstract

An open repository of real-time COVID-19 indicators

Reinhart

Brooks

Jahja

et al. 2021

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

The COVID-19 pandemic presented enormous data challenges in the United States. Policy makers, epidemiological modelers, and health researchers all require up-to-date data on the pandemic and relevant public behavior, ideally at fine spatial and temporal resolution. The COVIDcast API is our attempt to fill this need: Operational since April 2020, it provides open access to both traditional public health surveillance signals (cases, deaths, and hospitalizations) and many auxiliary indicators of COVID-19 activity, such as signals extracted from deidentified medical claims data, massive online surveys, cell phone mobility data, and internet search trends. These are available at a fine geographic resolution (mostly at the county level) and are updated daily. The COVIDcast API also tracks all revisions to historical data, allowing modelers to account for the frequent revisions and backfill that are common for many public health data sources. All of the data are available in a common format through the API and accompanying R and Python software packages. This paper describes the data sources and signals, and provides examples demonstrating that the auxiliary signals in the COVIDcast API present information relevant to tracking COVID activity, augmenting traditional public health reporting and empowering research and decision-making.

show abstract

Locally Estimating Core Numbers

O’Brien

Sullivan

2014

View full text Add to dashboard Cite

Abstract-Graphs are a powerful way to model interactions and relationships in data from a wide variety of application domains. In this setting, entities represented by vertices at the "center" of the graph are often more important than those associated with vertices on the "fringes". For example, central nodes tend to be more critical in the spread of information or disease and play an important role in clustering/community formation. Identifying such "core" vertices has recently received additional attention in the context of network experiments, which analyze the response when a random subset of vertices are exposed to a treatment (e.g. inoculation, free product samples, etc). Specifically, the likelihood of having many central vertices in any exposure subset can have a significant impact on the experiment.We focus on using k-cores and core numbers to measure the extent to which a vertex is central in a graph. Existing algorithms for computing the core number of a vertex require the entire graph as input, an unrealistic scenario in many real world applications. Moreover, in the context of network experiments, the subgraph induced by the treated vertices is only known in a probabilistic sense. We introduce a new method for estimating the core number based only on the properties of the graph within a region of radius δ around the vertex, and prove an asymptotic error bound of our estimator on random graphs. Further, we empirically validate the accuracy of our estimator for small values of δ on a representative corpus of real data sets. Finally, we evaluate the impact of improved local estimation on an open problem in network experimentation posed by Ugander et al.

show abstract

Zig-Zag Numberlink is NP-Complete

Adcock

Demaine

et al. 2015

Journal of Information Processing

View full text Add to dashboard Cite

When can t terminal pairs in an m × n grid be connected by t vertex-disjoint paths that cover all vertices of the grid? We prove that this problem is NP-complete. Our hardness result can be compared to two previous NPhardness proofs: Lynch's 1975 proof without the "cover all vertices" constraint, and Kotsuma and Takenaga's 2010 proof when the paths are restricted to have the fewest possible corners within their homotopy class. The latter restriction is a common form of the famous Nikoli puzzle Numberlink. Our problem is another common form of Numberlink, sometimes called Zig-Zag Numberlink and popularized by the smartphone app Flow Free.

show abstract

Exploring neighborhoods in large metagenome assembly graphs reveals hidden sequence diversity

Brown¹,

Moritz

O’Brien

et al. 2018

Preprint

View full text Add to dashboard Cite

Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software implementation is available at https://github.com/spacegraphcats/ spacegraphcats under the 3-Clause BSD License. metagenomics | sequence assembly | strain variation | bounded expansion | dominating set * an invertible function that defines both an index and the corresponding inverted index

show abstract

Polynomial Treedepth Bounds in Linear Colorings

et al. 2020

View full text Add to dashboard Cite

Low-treedepth colorings are an important tool for algorithms that exploit structure in classes of bounded expansion; they guarantee subgraphs that use few colors have bounded treedepth. These colorings have an implicit tradeoff between the total number of colors used and the treedepth bound, and prior empirical work suggests that the former dominates the run time of existing algorithms in practice. We introduce p-linear colorings as an alternative to the commonly used p-centered colorings. They can be efficiently computed in bounded expansion classes and use at most as many colors as p-centered colorings. Although a set of k < p colors from a p-centered coloring induces a subgraph of treedepth at most k, the same number of colors from a p-linear coloring may induce subgraphs of larger treedepth. We establish a polynomial upper bound on the treedepth in general graphs, and give tighter bounds in trees and interval graphs via constructive coloring algorithms. We also give a co-NP-completeness reduction for recognizing p-linear colorings and discuss ways to overcome this limitation in practice.

show abstract

An Open Repository of Real-Time COVID-19 Indicators

Reinhart

Brooks

Jahja

et al. 2021

Preprint

View full text Add to dashboard Cite

The COVID-19 pandemic presented enormous data challenges in the United States. Policy makers, epidemiological modelers, and health researchers all require up-to-date data on the pandemic and relevant public behavior, ideally at fine spatial and temporal resolution. The COVIDcast API is our attempt to fill this need: operational since April 2020, it provides open access to both traditional public health surveillance signals (cases, deaths, and hospitalizations) and many auxiliary indicators of COVID- 19 activity, such as signals extracted from de-identified medical claims data, massive online surveys, cell phone mobility data, and internet search trends. These are available at a fine geographic resolution (mostly at the county level) and are updated daily. The COVIDcast API also tracks all revisions to historical data, allowing modelers to account for the frequent revisions and backfill that are common for many public health data sources. All of the data is available in a common format through the API and accompanying R and Python software packages. This paper describes the data sources and signals, and provides examples demonstrating that the auxiliary signals in the COVIDcast API present information relevant to tracking COVID activity, augmenting traditional public health reporting and empowering research and decision-making.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michael P. O’Brien

Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity

A practical fpt algorithm for Flow Decomposition and transcript assembly

An open repository of real-time COVID-19 indicators

Locally Estimating Core Numbers

Zig-Zag Numberlink is NP-Complete

Exploring neighborhoods in large metagenome assembly graphs reveals hidden sequence diversity

Polynomial Treedepth Bounds in Linear Colorings

An Open Repository of Real-Time COVID-19 Indicators

Contact Info

Product

Resources

About