2012
DOI: 10.1214/11-aoas511
|View full text |Cite
|
Sign up to set email alerts
|

Context tree selection and linguistic rhythm retrieval from written texts

Abstract: The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
57
0
5

Year Published

2013
2013
2020
2020

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 54 publications
(65 citation statements)
references
References 20 publications
(25 reference statements)
3
57
0
5
Order By: Relevance
“…The distance allows the construction of consistent estimation algorithms to identify the partition. Research in progress suggests that this measure can be harnessed for the development and implementation of robust estimation techniques, given that there are records (see [19]) of the need of these techniques for Markov processes. In summary, in this paper, in addition to responding positively to the question of whether the Bayesian Information Criterion is capable of allowing a consistent estimation of the partition of the Markov process, we also obtain that, in terms of the model selection procedure, the Bayesian Information Criterion corresponds to a distance in the state space of the Markov process.…”
Section: Discussionmentioning
confidence: 99%
“…The distance allows the construction of consistent estimation algorithms to identify the partition. Research in progress suggests that this measure can be harnessed for the development and implementation of robust estimation techniques, given that there are records (see [19]) of the need of these techniques for Markov processes. In summary, in this paper, in addition to responding positively to the question of whether the Bayesian Information Criterion is capable of allowing a consistent estimation of the partition of the Markov process, we also obtain that, in terms of the model selection procedure, the Bayesian Information Criterion corresponds to a distance in the state space of the Markov process.…”
Section: Discussionmentioning
confidence: 99%
“…The rhythm of language appears to be culturally regulated. Galves et al [10] extract streams of stresses from corpora of newspaper articles written in both European and Brazilian Portuguese, and use Variable Length Markov Chains [11] to model rhythmic realization in the two corpora, arriving at different final models. Where cultural background influences linguistic rhythm, it similarly influences musical rhythm, as shown by Patel and Daniele [12].…”
Section: Related Workmentioning
confidence: 99%
“…There are diverse methodologies for VLMC model selection (see Rissanen, 1983;Buhlmann et al, 1999;Csiszár et al, 2006;Galves et al, 2012). This article uses version of the Context Tree Maximization (from now on CTM) algorithm introduced by Csiszár et al (2006), which is based on the Bayesian Information Criterion (BIC).…”
Section: Introductionmentioning
confidence: 99%
“…The strategy introduced in this article was applied in Garcia et al (2012) to linguistic data from eight languages. In Garcia et al (2012), the objective was to check a linguistic conjecture that classifies languages into three rhythmic classes: "syllable-timed," "stresstimed" and "mora-timed."…”
Section: Introductionmentioning
confidence: 99%