This paper is intended to lay out for broader discussion some arguments for the importance of data in work in generative syntax. These are accepted by many linguists, but a significant number of others still seem reluctant to accept them. The basic claim is that it is no longer tenable for syntactic theories to be constructed on the evidence of a single person's judgements, and that real progress can only be made when syntacticians begin to think more carefully about the empirical basis of their work and apply the minimum standards we propose. We advance two groups of reasons for syntacticians to do this, negative and positive. The negative ‘stick’ group concerns the inadequacy of current practice. We argue that linguists are producing unsatisfactory work with these methods. Data quality is a limiting factor: a theory can only ever be as good as its data base. The positive ‘carrot’ group concerns the descriptive and theoretical advantages which become available with more empirically adequate data. We hope to tempt linguists to adopt new methods by showing them the insights which better data makes available.
This article summarizes the findings of some of our studies of the data base of syntactic theory, contrasting the characteristics of frequency data and judgement data. Examination of frequency data reveals that the factors affecting its production interact competitively and probabilistically. This contrasts strongly with the patterns observed in judgement data, which point to a system in which violations of constraints produce negative weightings on form/meaning pairs. Since both data types are the result of human linguistic processing, we present a model of the architecture that such a system might have in order to produce such contrasting data. This Decathlon Model has two modules: Constraint Application and Output Selection. The first is blind, exceptionless and applies violation costs cumulatively (Keller 2000), the second is competitive and probabilistic. This constrains frameworks of syntactic explanation: an empirically adequate grammar must include gradient well-formedness, specify constraint violation costs, and distinguish between the application of rules and the selection of outputs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.