This study applied systematic meta‐analytic procedures to summarize findings from experimental and quasi‐experimental investigations into the effectiveness of using the tools and techniques of corpus linguistics for second language learning or use, here referred to as data‐driven learning (DDL). Analysis of 64 separate studies representing 88 unique samples reporting sufficient data indicated that DDL approaches result in large overall effects for both control/experimental group comparisons (d = 0.95) and for pre/posttest designs (d = 1.50). Further investigation of moderator variables revealed that small effect sizes were generally tied to small sample sizes. Research has barely begun in some key areas, and durability/transfer of learning through delayed posttesting remains an area in need of further investigation. Although DDL research demonstrably improved over the period investigated, further changes in practice and reporting are recommended.
Open Practices
This article has been awarded Open Materials and Open Data badges. All materials and data are publicly accessible via the Open Science Framework at https://osf.io/jkktw. Learn more about the Open Practices badges from the Center for Open Science: https://osf.io/tvyxz/wiki.
The potential for corpora in language learning has attracted a significant amount of attention in recent years, including in the form of data-driven learning (DDL). Careful not to appear to over-promote the field, enthusiasts have urged caution in its application, in particular with regard to lower-level learners, and have argued that extensive learner-training in corpus techniques is an essential condition for DDL to be successful. Such limits seem eminently reasonable, but there is a notable dearth of empirical studies to support them. This paper describes a simple experiment to see how lower-level learners cope with corpus data with no prior training.The language focus here is on linking adverbials in English, which are renowned to be difficult to teach using traditional methods. The subjects are 132 first-year students at an engineering college in France of roughly intermediate and lower levels of English. They were divided into random groups to compare their ability to deal with the target items using traditional sources (extracts from a bilingual dictionary or a grammar/usage manual) or corpus data (short contexts or truncated concordances). Performance was tested prior to the experiment, subsequently to check ability to use the different information sources as a reference, and later to test recall.No evidence was found that traditional sources promote better recall, and corpus data seemed to be more effective for reference purposes. While the results of any single experiment must be treated with caution, these findings suggest the need for more empirical studies to complement the theoretical arguments and qualitative data which currently dominate the discussions of DDL.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.