Using corpora in language teaching : What researchers talk about

Corpora have long been used in language teaching / learning, both as a learning resource and as a reference tool. Though commonly associated with Tim Johns' (1990) data-driven learning (DDL), visions of such uses differ - not least because Johns covered such a range of territory. Rather than appeal only to Johns' work, we might then ask what researchers in the field actually do. This study analyses a corpus of 600,000 words derived from 110 academic papers published between 1989 and 2012 which evaluate some aspect of corpus use in language teaching or learning. The paper describes the compilation of the corpus and key findings - they "stories" they tell (Tribble 2012). For example, though Johns is the author most frequently cited, 23% of papers contain no mention of either Johns or DDL. Another common term is corpus-based: the 367 occurrences in 69 papers collocate especially with activities and approach, while the 64 occurrences of corpus-driven in 11 papers collocate more with language and research. Comparing older and more recent papers shows changes in the field: early keywords include concordancing and vocabulary, while more recent ones feature Google and writing. A keyword comparison with the BNC can be built into a general vision such as: Students using a language corpus for English writing, searching concordance data for patterns in context in various tasks provided by the teacher in the course. However, many of the main advantages attributed to the use of corpus data are notable by their relative infrequency, especially: individualisation, constructivism, collaborative learning and noticing (and related forms) occur less than once in 10,000 words, while a maximum of two papers refer frequently (ten times or more) to concepts such as responsibility, exposure, learning styles, communicative skills and autonomy, suggesting such concepts remain under-researched. The paper builds on such findings to create a comprehensive survey of research to date, pointing the way for future work.

Data and Resources

Additional Info

Field Value
Source American Association for Corpus Linguistics
Author Boulton, Alex
Maintainer CCSD
Last Updated May 7, 2026, 04:19 (UTC)
Created May 7, 2026, 04:19 (UTC)
Identifier hal-00938165
Language en
contributor Centre de Recherches et d'Applications Pédagogiques en Langues (CRAPEL) ; Université Nancy 2
coverage San Diego, United States
creator Boulton, Alex
date 2013-01-18T00:00:00
harvest_object_id 7b03cac5-8b47-4f70-ab4f-25da5346f08a
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2025-11-04T00:00:00
set_spec type:COMM