Utilizing Corpus Technology
to Facilitate the Learning of
English Collocations
Kwanjira Chatpunnarangsee
• Corpus/Corpora: A collection of written texts or
transcriptions of spoken language stored electronically, and
can be analyzed using a concordancer.
• Collocation: A pair or group of words that are often used
together e.g. fast car, fast food, a quick glance, a quick
meal, make an effort, keep to/stick to the rules, etc.
Types of collocation
Adjective + Noun (Unemployment is a major problem…)
Noun+ Verb (The economy boomed in the 1990s.)
Noun + Noun (Let's give him a round of applause.)
Verb and Adverb (He pulled steadily on the rope and helped her to safety.)
Adverb and Adjective (They are happily married.)
Verb +Preposition (Your grade will depend on your work.)
Adjective preposition (The little girl is afraid of the dark.)
• English writing is considered to be a very
important skill that could help ESL/EFL students
either to have access to good jobs or to gain
admission to higher education.
• Students lack knowledge of common English
expressions/ collocations and this development
results in their writing English with a “foreign
• It is important for English teachers to focus on
instructional methods which can reduce the
frequency of students using this “foreign accent,”
as the benefit is twofold:
– beginning students will be able to produce a piece of writing
that proves more understandable,
– more advanced students can be encouraged to achieve
near-native writing fluency.
• ESL/EFL students do not have
sufficient linguistic input to help
them acquire English collocations.
• Corpus technology is believed to be
able to provide this type of input.
• Online concordancer
• Many studies (Chang & Sun, 2009;
Sinclair, 1997; Sun, 2003; Varley, 2008;
Yoon, 2008) suggest that corpus
consultation may be useful for language
learning because it can provide examples
of real language as it is actually used, and
this resource could be considered as
invaluable input for acquisition to take
• Corpus data VS dictionaries. Some dictionaries also
include collocations and examples of sentences
being used in a word.
• Language in a corpus is de-contextualized as it is
often presented in a sentence-based (or even half
sentence) form (Wu, 2009).
• Students may find too many examples, which
could have an effect on students’ comprehension
to some extent.
• Students may find too few examples, or none at
all, if they conduct a search in a fairly small corpus
or in a specialized corpus that does not include a
particular word.
Corpus-based Studies in the Thai
• Todd (2001) examined students’
induction ability resulting from
concordancing activity.
• Sripicharn (2002) evaluated the use
of teacher-designed DDL materials.
• Data-Driven Learning (DDL) approach
claims that language learners can use
data derived from electronic corpora as
linguistic sources. DDL attempts to “cut
out the middleman as far as possible and
give direct access to the data so that the
learner can take part in building up his or
her own profiles of meanings and uses”
(Johns, 1991b, pp. 30-31).
• Three effects of adopting the DDL
– 1. DDL can affect the language learning process
by helping the learner develop the ability to see
patterns in the target language, and by
extension, to form generalizations to account
for that patterning;
– 2. DDL also has an effect on the teacher’s role,
in that the teacher becomes more like a director
or coordinator of student research; and
– 3. DDL makes possible a new style of
grammatical consciousness-raising by placing
the learner’s own discovery of grammar at the
center of language learning.
Research Questions
• 1.How do the students describe their problem
solving processes and strategies in the web-based
concordancing setting?
• 2.How do the students describe their views toward
using corpus data as a linguistic reference?
• Participants
• Nada, 31, possessed an Engineering
undergraduate degree from a Thai university, and
had stayed in the U.S. for fifteen months, TOEFL
score of 550.
• Karn, 26, was a student in the Intensive English
Program and had stayed in the U.S. for seven
months,TOEFL score of 540.
The Context
• Workshop, met four times, two
hours each
• A textbook titled, Paragraph
Writing: A Process Approach
Rhetorical foci
Narrating past events
People Around Us
Describing a person
A Special Place
Describing a place
Describing present events
Leisure Time
Giving details and examples
Education and Student Life
Giving reasons to support opinions
Each chapter was organized to follow eight steps in the writing process:
generating ideas, organizing ideas, development cohesion, writing the first draft,
revising your writing, editing your writing, writing the second draft, and
developing your skills.
Instruments and Data
Concordancer and corpora
Narrative description of think-alouds
Interviews and transcriptions
Pre and post-project questionnaires
Pre-test and post-test
Written reflections
Collocation worksheets
Think-aloud task
The task was based on the following ten collocations, adapted from the
British National Corpus (Written and Spoken) and the Brown Corpus on
Lextutor website. Those collocations were (1) in his late teens, (2) of
medium height, (3) ashamed of, (4) on the phone, (5) between 10
p.m. and 2 a.m, (6) get so annoyed, (7) turn grey, (8) immediate
reaction, (9) came running, and (10) one of the events. Then, eight
out of ten collocations were replaced by the following incorrect
I met someone who was *about his late teens.
He is *a medium height.
He is now *ashamed from his conduct.
He talks *on a phone.
He talks *between 10 p.m. to 2 a.m.
I get so annoyed with his talking and laughing.
Some days I think my hair could *get grey because of my roommate’s
My roommate always has *immediately reaction to the expressions.
He came running to pick me up.
It was *one of events that showed me how thoughtful he is.
Students’ problem solving processes and
strategies in a web-based concordancing
• The participant guessed the correct
answer whenever they could and then
used the concordancer to confirm or
disconfirm their initial assumption.
An example of when concordance lines disconfirmed their
guess. Item 5, He talks *between 10 p.m. to 2 a.m.
Nada: “I have always thought between (time) to (time) was correct because this phrase
can be written with a dash and a dash to me means to. I am surprised. Now I am looking
at other examples. I can see that between ...and … can be used with many different
things such as age (between 2 and 5 years old), distance (between 200 and 400 meters),
percent (between 11 and 17 percent).”
• They used their English grammatical knowledge to
help them derive at a correct answer.
• However, heavy reliance on grammatical
knowledge (Karn’s case) seemed to hinder him
from paying attention to the context of the
question and the examples on the output page,
and misled him to give a wrong answer.
– Item 7 Some days I think my hair could *get grey because
of my roommate’s behavior.
Paying attention to the context in which the
collocations were in was important.
Nada’s case item 2 He is *a medium height.
She found 118 concordance lines. She looked at the ones that were
relevant to her query, as shown in the figure, and she clicked on KWIC
(Key Word In Context), HEIGHT on line 55 to see its full text. She
decided that of medium height was the correct answer because it can
be used to describe a person’s appearance, and her item 2 seemed to
have the same purpose.
• Problems that the participants
• 1.Sometimes too many results on
the output page, other times, too
few results;
• 2.The issue of choosing a keyword.
If they picked the wrong keyword,
they did not receive the right set of
• 3.The problem of choosing a corpus.
Students’ views toward using
corpus data as a linguistic
• The perceived advantages of
• 1.They could take a more active role
in learning;
• 2.They could be more autonomous
• 3. They remembered the new
knowledge better.
• Perceived disadvantages of
• 1. The complex process involved in
using the tool (It took time and effort to
get to the data and to derive correct
collocations from corpus data, due to the
difficulty level of the examples and the
availability of data in the corpora);
• 2. The appearance of the tool is not
The results of the post-project
questionnaires corresponded to the
interview responses in many ways.
Benefits of corpus use
Problems or difficulties of corpus use
Overall evaluation of corpus use
• The results of this study suggest that whether learners can
successfully take part in building up their own profiles of
meanings and uses partly depends upon their English
proficiency level. The student with a high English proficiency
demonstrated a strong capability in analyzing corpus data
and using inductive abilities to solve collocation problems;
however, this was not always the case for the lower
proficiency student. Yet, access to linguistic data with
established software proved to offer invaluable input;
therefore, for pedagogical recommendations, some
assistance such as scaffolding activities or step-by-step
instructions of how to analyze corpus data could be useful
from the outset, so as to guide learners to experience
successful collocation learning.
• 1. The types of collocation that should have been
included were noun+verb, noun+noun.
• 2.Items for the think-aloud task should have
included an item where participants were required
to come up with a new keyword, such as the one
in item 7 *get grey. These types of prompts could
distinguish the difference between high and low
proficiency students’ problem-solving processes.
• 3.It would have been useful to keep track of time
for each concordance session, to examine whether
the participants took less time as they worked on
latter concordance task.
• **Future studies in the same vein should consider
addressing these limitations.**
