Shape is data and data is shape. Biologists are accustomed to thinking about how the shape of biomolecules, cells, tissues, and organisms arise from the effects of genetics, development, and the environment. Less often do we consider that data itself has shape and structure, or that it is possible to measure the shape of data and analyze it. Here, we review applications of topological data analysis (TDA) to biology in a way accessible to biologists and applied mathematicians alike. TDA uses principles from algebraic topology to comprehensively measure shape in data sets. Using a function that relates the similarity of data points to each other, we can monitor the evolution of topological features—connected components, loops, and voids. This evolution, a topological signature, concisely summarizes large, complex data sets. We first provide a TDA primer for biologists before exploring the use of TDA across biological sub‐disciplines, spanning structural biology, molecular biology, evolution, and development. We end by comparing and contrasting different TDA approaches and the potential for their use in biology. The vision of TDA, that data are shape and shape is data, will be relevant as biology transitions into a data‐driven era where the meaningful interpretation of large data sets is a limiting factor.
The analysis of interaction between movement trajectories is of interest for various domains when movement of multiple objects is concerned. Interaction often includes a delayed response, making it difficult to detect interaction with current methods that compare movement at specific time intervals. We propose analyses and visualizations, on a local and global scale, of delayed movement responses, where an action is followed by a reaction over time, on trajectories recorded simultaneously. We developed a novel approach to compute the global delay in subquadratic time using a fast Fourier transform (FFT). Central to our local analysis of delays is the computation of a matching between the trajectories in a so-called delay space. It encodes the similarities between all pairs of points of the trajectories. In the visualization, the edges of the matching are bundled into patches, such that shape and color of a patch help to encode changes in an interaction pattern. To evaluate our approach experimentally, we have implemented it as a prototype visual analytics tool and have applied the tool on three bidimensional data sets. For this we used various measures to compute the delay space, including the directional distance, a new similarity measure, which captures more complex interactions by combining directional and spatial characteristics. We compare matchings of various methods computing similarity between trajectories. We also compare various procedures to compute the matching in the delay space, specifically the Fréchet distance, dynamic time warping (DTW), and edit distance (ED). Finally, we demonstrate how to validate the consistency of pairwise matchings by computing matchings between more than two trajectories. ARTICLE HISTORY
We show by reduction from the Orthogonal Vectors problem that algorithms with strongly subquadratic running time cannot approximate the Fréchet distance between curves better than a factor 3 unless SETH fails. We show that similar reductions cannot achieve a lower bound with a factor better than 3. Our lower bound holds for the continuous, the discrete, and the weak discrete Fréchet distance even for curves in one dimension. Interestingly, the continuous weak Fréchet distance behaves differently. Our lower bound still holds for curves in two dimensions and higher. However, for curves in one dimension, we provide an exact algorithm to compute the weak Fréchet distance in linear time. ACM Subject Classification I.3.5 Computational Geometry and Object ModelingWeak Fréchet Distance is Faster if it is Continuous and in One Dimension distance in one dimension can also be used as a subroutine for approximating the Fréchet distance for curves in two and higher dimensions [8]. Bringmann's lower bound sparked renewed interest in the computation of the Fréchet distance between one-dimensional curves. Cabello and Korman showed that for two 1D curves that do not overlap, the Fréchet distance can be computed in linear time (personal communication, referenced in [8]). Furthermore, Buchin et al. [15] proved that if one of the curves visits any location at most a constant number of times, then the Fréchet distance can be computed in near linear time. Both results apply only to restricted classes of curves and hence the general case in 1D remained open.Our results. In this paper we settle the general question for one dimension: we give a conditional lower bound for the Fréchet distance between two general polygonal curves in 1D. To do so we reduce (in linear time) from the Orthogonal Vector Problem: given two sets of vectors, is there a pair of orthogonal vectors, one from each set? For vectors of dimension d = ω(log n) no algorithm running in strongly subquadratic time is known. Furthermore, an algorithm with such a running time does not exist in various computational models [24] and would have far-reaching consequences [1]. In particular, the existence of a strongly subquadratic algorithm for the Orthogonal Vector Problem would imply that the Strong Exponential Time Hypothesis fails. Our reduction hence implies that no strongly subquadratic algorithm for approximating the Fréchet distance within a factor less than 3 exists unless SETH fails.Our result also improves upon the previously best known conditional lower bound for curves in 2D by Bringmann and Mulzer [9] (approximation within a factor less than 1.399). Furthermore, we argue that similar reductions, based on a "traditional" encoding of the Orthogonal Vectors Problem, cannot achieve a lower bound better than 3.Section 2 gives various definitions and background. In particular, we recall an asymmetric variant of the Fréchet distance introduced by Alt and Godau [4], the so-called partial Fréchet distance. In Section 3 we succinctly state all our results and in Section 4 we b...
Automatic extraction of channel networks from topography in systems with multiple interconnected channels, like braided rivers and estuaries, remains a major challenge in hydrology and geomorphology. Representing channelized systems as networks provides a mathematical framework for analyzing transport and geomorphology. In this paper, we introduce a mathematically rigorous methodology and software for extracting channel network topology and geometry from digital elevation models (DEMs) and analyze such channel networks in estuaries and braided rivers. Channels are represented as network links, while channel confluences and bifurcations are represented as network nodes. We analyze and compare DEMs from the field and those generated by numerical modeling. We use a metric called the volume parameter that characterizes the volume of deposited material separating channels to quantify the volume of reworkable sediment deposited between links, which is a measure for the spatial scale associated with each network link. Scale asymmetry is observed in most links downstream of bifurcations, indicating geometric asymmetry and bifurcation stability. The length of links relative to system size scales with volume parameter value to the power of 0.24-0.35, while the number of links decreases and does not exhibit power law behavior. Link depth distributions indicate that the estuaries studied tend to organize around a deep main channel that exists at the largest scale while braided rivers have channel depths that are more evenly distributed across scales. The methods and results presented establish a benchmark for quantifying the topology and geometry of multichannel networks from DEMs with a new automatic extraction tool. Plain Language Summary Channels are features of the Earth's surface that carry water and other material across the continents toward the coasts. We have long recognized that knowing the shapes, sizes, and connections of channels in rivers, estuaries, and deltas is vital for understanding and predicting future change. However, automatically identifying channel networks from surface elevation is challenging because channels display a wide range of different shapes, sizes, and patterns, including shallow and deep areas, and often have many intersections with other channels. We have developed a method for identifying channel networks from elevation surveys. We first find the "lowest path" in a channel network, meaning the channel that is at generally lower elevations than all other channels. Then we subsequently find the next lowest paths, where the measure for channel separation is the volume of sediment between channels. This method allows us to identify the channel network and analyze its shape and pattern. We show similarities and differences between the channel networks of estuaries and wide rivers with sand bars then compare channel networks found in nature and those generated in computer simulations. Our work helps researchers more fully understand and predict how channel networks develop and evolve.
Automatic and objective extraction of channel networks from topography in systems with multiple interconnected channels, like braided rivers and estuaries, remains a major challenge in hydrology and geomorphology. Representing channelized systems as networks provides a mathematical framework for analyzing transport and geomorphology. In this paper, we introduce a mathematically rigorous methodology and software for extracting channel network topology and geometry from digital elevation models (DEMs) and analyze such channel networks in estuaries and braided rivers. Channels are represented as network links, while channel confluences and bifurcations are represented as network nodes. We analyze and compare DEMs from the field and those generated by numerical modeling. We introduce a metric called the sand function that characterizes the volume of deposited material separating channels to quantify the spatial scale attributed to each link. Scale asymmetry is observed in the majority of links downstream of bifurcations, indicating geometric asymmetry and bifurcation stability. The length of links relative to system size scales with sand function scale to the power of 0.24-0.35, while the number of nodes decreases against system scale and does not exhibit power-law behavior. Link depth distributions indicate that the estuaries studied tend to organize around a deep main channel that exists at the largest scale while braided rivers have channel depths that are more evenly distributed across scales. The methods and results presented establish a benchmark for quantifying the topology and geometry of multi-channel networks from DEMs with an automatic and objective tool.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.