G. H. Tasca scite author profile

Tasca

2020

4open

In this paper, we investigate a specific structure within the theoretical framework of Partition Markov Models (PMM) [see García Jesús and González-López, Entropy 19, 160 (2017)]. The structure of interest lies in the formulation of the underlying partition, which defines the process, in which, in addition to a finite memory o associated with the process, a parameter G is introduced, allowing an extra dependence on the past complementing the dependence given by the usual memory o. We show, by simulations, how algorithms designed for the classic version of the PMM can have difficulties in recovering the structure investigated here. This specific structure is efficient for modeling a complete genome sequence, coming from the newly decoded Coronavirus Covid-19 in humans [see Wu et al., Nature 579, 265–269 (2020)]. The sequence profile is represented by 13 units (parts of the state space’s partition), for each of the 13 units, their respective transition probabilities are computed for any element of the genetic alphabet. Also, the structure proposed here allows us to develop a comparison study with other genomic sequences of Coronavirus, collected in the last 25 years, through which we conclude that Covid-19 is shown next to SARS-like Coronaviruses (SL-CoVs) from bats specimens in Zhoushan [see Hu et al., Emerg Microb Infect 7, 1–10 (2018)].

show abstract

An Efficient Coding Technique for Stochastic Processes

Tasca

et al. 2021

Entropy

In the framework of coding theory, under the assumption of a Markov process (Xt) on a finite alphabet A, the compressed representation of the data will be composed of a description of the model used to code the data and the encoded data. Given the model, the Huffman’s algorithm is optimal for the number of bits needed to encode the data. On the other hand, modeling (Xt) through a Partition Markov Model (PMM) promotes a reduction in the number of transition probabilities needed to define the model. This paper shows how the use of Huffman code with a PMM reduces the number of bits needed in this process. We prove the estimation of a PMM allows for estimating the entropy of (Xt), providing an estimator of the minimum expected codeword length per symbol. We show the efficiency of the new methodology on a simulation study and, through a real problem of compression of DNA sequences of SARS-CoV-2, obtaining in the real data at least a reduction of 10.4%.

show abstract

A stochastic inspection about genetic variants of Covid-19 circulating in Brazil during 2020

Garćıa¹,

González‐López²,

Tasca³

2022

Multiple partition Markov model for B.1.1.7, B.1.351, B.1.617.2, and P.1 variants of SARS-CoV 2 virus

Tasca³

2022

Comput Stat

With tools originating from Markov processes, we investigate the similarities and differences between genomic sequences in FASTA format coming from four variants of the SARS-CoV 2 virus, B.1.1.7 (UK), B.1.351 (South Africa), B.1.617.2 (India), and P.1 (Brazil). We treat the virus’ sequences as samples of finite memory Markov processes acting in We model each sequence, revealing some heterogeneity between sequences belonging to the same variant. We identified the five most representative sequences for each variant using a robust notion of classification, see Fernández et al. (Math Methods Appl Sci 43(13):7537–7549. 10.1002/mma.5705 ). Using a notion derived from a metric between processes, see García et al. (Appl Stoch Models Bus Ind 34(6):868–878. 10.1002/asmb.2346), we identify four groups, each group representing a variant. It is also detected, by this metric, global proximity between the variants B.1.351 and B.1.1.7. With the selected sequences, we assemble a multiple partition model, see Cordeiro et al. (Math Methods Appl Sci 43(13):7677–7691. 10.1002/mma.6079), revealing in which states of the state space the variants differ, concerning the mechanisms for choosing the next element in A . Through this model, we identify that the variants differ in their transition probabilities in eleven states out of a total of 256 states. For these eleven states, we reveal how the transition probabilities change from variant (group of variants) to variant (group of variants). In other words, we indicate precisely the stochastic reasons for the discrepancies.

show abstract

Inferência bayesiana para distribuições de cauda longa

Tasca¹

Stochastic Comparison Between the Original SARS-CoV 2 Genetic Structure and SARS-CoV 2 - P.1 Variant

Tasca

2022