Jesús E. Garćıa scite author profile

The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.

show abstract

Consistent Estimation of Partition Markov Models

Garćıa

González‐López

2017

Entropy

View full text Add to dashboard Cite

Abstract:The Partition Markov Model characterizes the process by a partition L of the state space, where the elements in each part of L share the same transition probability to an arbitrary element in the alphabet. This model aims to answer the following questions: what is the minimal number of parameters needed to specify a Markov chain and how to estimate these parameters. In order to answer these questions, we build a consistent strategy for model selection which consist of: giving a size n realization of the process, finding a model within the Partition Markov class, with a minimal number of parts to represent the process law. From the strategy, we derive a measure that establishes a metric in the state space. In addition, we show that if the law of the process is Markovian, then, eventually, when n goes to infinity, L will be retrieved. We show an application to model internet navigation patterns.

show abstract

A new index to measure positive dependence in trivariate distributions

Garćıa

González‐López

Nelsen

2013

Journal of Multivariate Analysis

View full text Add to dashboard Cite

Sample selection procedure in daily trading volume processes

Fernández¹,

Garćıa

Gholizadeh

et al. 2019

Math Methods in App Sciences

View full text Add to dashboard Cite

In this paper, we propose a procedure of selecting samples from a set of samples coming from Markovian processes of finite order and finite alphabet. Under the assumption of the existence of a law that prevails in at least q% of the samples of the collection, we show that the procedure allows to identify samples governed by the predominant law. The approach is based on a local metric between samples, which tends to zero when we compare samples of identical law and tends to infinity when comparing samples with different laws. The local metric allows to define a criterion which takes arbitrarily large values when the previous assumption about the existence of a predominant law does not hold. By means of this procedure, we map similarities and dissimilarities of some Brazilian stocks' daily trading volume dynamic.

show abstract

Partition Markov model for multiple processes

Cordeiro

Garćıa

González‐López

et al. 2020

Math Methods in App Sciences

View full text Add to dashboard Cite

In this paper, we analyze the model proposed in García and Londoño1 in which a set of p‐independent sequences of discrete time Markov chains is considered, over a finite alphabet A and with finite order o. The model is obtained identifying the states on the state space Ao where two or more sequences share the same transition probabilities (see also García and González‐López2). This identification establishes a partition on {1,…,p}×Ao, the set of sequences, and the state space. We show that by means of the Bayesian information criterion (BIC), the partition can be estimated eventually almost surely. Also, in García and Londoño,1 it is given a notion of divergence, derived from the BIC, which serves to identify the proximity/discrepancy between elements of {1,…,p}×Ao (see also García et al3). In the present article, we prove that this notion is a metric in the space where the model is built and that it is statistically consistent to determine proximity/discrepancy between the elements of the space {1,…,p}×Ao. We apply the notions discussed here for the construction of a parsimonious model that represents the common stochastic structure of 153 complete genomic Zika sequences, coming from tropical and subtropical regions.

show abstract

A BIC‐based consistent metric between Markovian processes

Garćıa

Gholizadeh

González‐López

2018

Appl Stoch Models Bus & Ind

View full text Add to dashboard Cite

In this paper, we address the problem of deciding if two independent samples coming from discrete Markovian processes are governed by the same stochastic law. We establish a local metric between samples based on the Bayesian information criterion. In addition, we derive the bound that must be used in this metric to take the decision. In the case on which is decided that the laws are not the same, the metric allows to detect the specific elements of the state space where the discrepancies are manifested. We prove that the metric is statistically consistent to detect if the samples follow the same law, tending to zero when the sample sizes increase. Moreover, we show that the metric assumes arbitrarily large values when the sample sizes increase and the stochastic laws are different. This concept is applied to analyze two lines of production of alcohol fuel, described by five variables each. We identify the variables that most contribute to the discrepancy and, using the local nature of the metric, we list the realizations in which the processes behave differently. KEYWORDSBayesian information criterion, Markov processes, proximity between processes, relative entropy 868

show abstract

Partition Markov Model for Covid-19 Virus

Garćıa

González‐López

Tasca

2020

4open

View full text Add to dashboard Cite

In this paper, we investigate a specific structure within the theoretical framework of Partition Markov Models (PMM) [see García Jesús and González-López, Entropy 19, 160 (2017)]. The structure of interest lies in the formulation of the underlying partition, which defines the process, in which, in addition to a finite memory o associated with the process, a parameter G is introduced, allowing an extra dependence on the past complementing the dependence given by the usual memory o. We show, by simulations, how algorithms designed for the classic version of the PMM can have difficulties in recovering the structure investigated here. This specific structure is efficient for modeling a complete genome sequence, coming from the newly decoded Coronavirus Covid-19 in humans [see Wu et al., Nature 579, 265–269 (2020)]. The sequence profile is represented by 13 units (parts of the state space’s partition), for each of the 13 units, their respective transition probabilities are computed for any element of the genetic alphabet. Also, the structure proposed here allows us to develop a comparison study with other genomic sequences of Coronavirus, collected in the last 25 years, through which we conclude that Covid-19 is shown next to SARS-like Coronaviruses (SL-CoVs) from bats specimens in Zhoushan [see Hu et al., Emerg Microb Infect 7, 1–10 (2018)].

show abstract

Surprise, p-value, s-value and a diagnostic procedure to detect not informative experiments

Recchia

Ostermann

Garćıa

2016

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.