Abstract. We present a measure of cognitive complexity for subclasses of the regular languages that is based on model-theoretic complexity rather than on description length of particular classes of grammars or automata. Unlike description length approaches, this complexity measure is independent of the implementation details of the cognitive mechanism. Hence, it provides a basis for making inferences about cognitive mechanisms that are valid regardless of how those mechanisms are actually realized.
We introduce algorithms that, given a finite-state automaton (FSA), compute a minimal set of forbidden local factors that define a Strictly Local (SL) tight approximation of the stringset recognised by the FSA and the set of forbidden piecewise factors that define a Strictly Piecewise (SP) tight approximation of that stringset, as well as a set of co-SL factors that, together with the SL and SP factors, provide a set of purely conjunctive literal constraints defining a minimal superset of the stringset recognised by the automaton. Using these, we have built computational tools that have allowed us to reproduce, by nearly purely computational means, the work of Rogers and his co-workers (Rogers et al. 2012) in which, using a mix of computational and analytical techniques, they completely characterised, with respect to the Local and Piecewise Subregular hierarchies, the constraints on the distribution of stress in human languages that are documented in the StressTyp2 database. Our focus, in this paper, is on the algorithms and the method of their application. The phonology of stress patterns is a particularly good domain of application since, as we show here, they generally fall at the very lowest levels of complexity. We discuss these phonological results here, but do not consider their consequences in depth. 1 introduction That phonology is finite-state-characterised by patterns and functions that can be recognised by finite-state automata of varying types
The work presented here continues a program of completely characterizing the constraints on the distribution of stress in human languages that are documented in the StressTyp2 database with respect to the Local and Piecewise sub-regular hierarchies.We introduce algorithms that, given a Finite-State Automaton, compute a set of forbidden words, units, initial factors, free factors and final factors that define a Strictly Local (SL) approximation of the stringset recognized by the FSA, along with a minimal DFA that recognizes the residue set: the set of strings in the approximation that are not in the stringset recognized by the FSA. If the FSA recognizes an SL stringset, then the approximation is exact (otherwise it overgenerates).We have applied these tools to the 106 lects that have associated DFAs in the StressTyp2 database, a wide-coverage corpus of stress patterns that are attested in human languages. The results include a large number of strictly local constraints that have not been included in prior work categorizing these patterns with respect to the Local and Piecewise Sub-Regular hierarchies of Rogers et al. (2012), although, of course, they do not contradict the central result of that work, which establishes an upper bound on their complexity that includes strictly local constraints.
We derive well-understood and well-studied subregular classes of formal languages purely from the computational perspective of algorithmic learning problems. We parameterise the learning problem along dimensions of representation and inference strategy. Of special interest are those classes of languages whose learning algorithms are necessarily not prohibitively expensive in space and time, since learners are often exposed to adverse conditions and sparse data. Learned natural language patterns are expected to be most like the patterns in these classes, an expectation supported by previous typological and linguistic research in phonology. A second result is that the learning algorithms presented here are completely agnostic to choice of linguistic representation. In the case of the subregular classes, the results fall out from traditional model-theoretic treatments of words and strings. The same learning algorithms, however, can be applied to model-theoretic treatments of other linguistic representations such as syntactic trees or autosegmental graphs, which opens a useful direction for future research.
We derive abstract characterizations of the Strictly Piecewise Local (SPL) and Piecewise Locally Testable (PLT) stringsets. These generalize both the Strictly Local/Locally Testable stringsets (SL and LT) and Strictly Piecewise/Piecewise Testable stringsets (SP and PT) in that SPL constraints can be stated in terms of both adjacency and precedence.We do this in a fully abstract setting which applies to any class of purely relational models that label the points in their domain with some finite labeling alphabet. This includes, for example, labeled trees and graphs. The actual structure of the class of intended models only shows up in interpreting the abstract characterizations of the definable sets in terms of the structure of the models themselves.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.