In this note, we characterize the embeddability of generic Kimura 3ST Markov matrices in terms of their eigenvalues. As a consequence, we are able to compute the volume of such matrices relative to the volume of all Markov matrices within the model. We also provide examples showing that, in general, mutation rates are not identifiable from substitution probabilities. These examples also illustrate that symmetries between mutation probabilities do not necessarily arise from symmetries between the corresponding mutation rates.
Deciding whether a substitution matrix is embeddable (i.e. the corresponding Markov process has a continuous-time realization) is an open problem even for 4 × 4 matrices. We study the embedding problem and rate identifiability for the K80 model of nucleotide substitution. For these 4×4 matrices, we fully characterize the set of embeddable K80 Markov matrices and the set of embeddable matrices for which rates are identifiable. In particular, we describe an open subset of embeddable matrices with non-identifiable rates. This set contains matrices with positive eigenvalues and also diagonal largest in column matrices, which might lead to consequences in parameter estimation in phylogenetics. Finally, we compute the relative volumes of embeddable K80 matrices and of embeddable matrices with identifiable rates. This study concludes the embedding problem for the more general model K81 and its submodels, which had been initiated by the last two authors in a separate work.
Characterizing whether a Markov process of discrete random variables has an homogeneous continuous-time realization is a hard problem. In practice, this problem reduces to deciding when a given Markov matrix can be written as the exponential of some rate matrix (a Markov generator). This is an old question known in the literature as the embedding problem [Elf37], which has been only solved for matrices of size 2 × 2 or 3 × 3. In this paper, we address this problem and related questions and obtain results in two different lines. First, for matrices of any size, we give a bound on the number of Markov generators in terms of the spectrum of the Markov matrix. Based on this, we establish a criterion for deciding whether a generic Markov matrix (different eigenvalues) is embeddable and propose an algorithm that lists all its Markov generators. Then, motivated and inspired by recent results on substitution models of DNA, we focus in the 4 × 4 case and completely solve the embedding problem for any Markov matrix. The solution in this case is more concise as the embeddability is given in terms of a single condition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.