Retrospective vs. concurrent think-aloud protocols: usability testing

“Retrospective vs. concurrent think-aloud
protocols: usability testing of an online library
Presented by:
Aram Saponjyan & Elie Boutros
• The article discusses the think aloud
techniques that are used as part of usability
• The two main think aloud approaches,
retrospective and concurrent, are compared
through the test of an online library catalogue.
• The three main points of comparison are : the
detected usability problems, the overall task
performance and the participants experience.
Think Aloud Protocol
• A method of usability evaluation.
• A method that allows researchers to
understand the thought process of testers as
they use a given product or device.
• It is a great method for software designers to
interact with potential users and to improve
their designs based on the user feedback.
• In RTA, participants are asked to perform a set
of tests silently(while being video taped) and
then verbalize their experience at the end of
the testing session while watching themselves
on tape.
• In CTA, participants are asked to explain their
thoughts as they are testing the product. A
facilitator is always present to remind them to
“think aloud” in case they remain silent.
• Both the retrospective and the concurrent
techniques are used for usability tests of
websites,GUIs, and database front ends.
• Both techniques are valid, useful, and widely
adopted in usability tests.
• Both methods yield nearly unbiased software
evaluations since participants do not have to
recall their thoughts long after performing the
Advantages of CTA
• CTA tends to involve less biased thoughts
since users are asked to verbalize their
thinking process during task performance.
“CTA is more faithful representative of a
strictly task oriented usability test”.
• More observed problems are revealed during
task completion as opposed to the RTA which
depends heavily on the user’s verbalizations
which take place after task completion.
Disadvantages of CTA
• Users might potentially feel uncomfortable
verbalizing their thoughts while performing
the task at hand.( especially if they are not
doing so in their native spoken language)
• participants have an extra burden in speaking
their thoughts while performing the tasks as
opposed to the RTA where users have more
time to verbalize the problems after task
Effects of CTA disadvantages on test
• This burden did not slow down the process of
task completion. However, the success rate of
task completion was affected. CTA participants
were less successful in completing their tasks
than those who used RTA.
Advantages of RTA
• Participants are not burdened with the extra task
of verbalizing their thoughts as they test. This will
make it easier for non-native English speakers
since they will have more time to think and
translate their thoughts from their native
language into English.
• Another benefit of RTA is the potential decrease
in reactivity since participant can execute a task
at their own pace and are not rushed in a way
that can affect their normal software usage. This
will make it more likely for them to not perform
better nor worse than usual.
Disadvantages of RTA
• RTA might not be as precise in the user
experience description as CTA since users are
asked to describe their experience after
finishing their tasks. This extra time might
introduce biased judgment since participants
might forget specific things they had
experienced during their task performance.
• Overall session time is longer in RTA than it is
in CTA since users of RTA not only perform
their tasks but also watch these in retrospect.
Test Object.
• The online library catalogue was chosen to be
tested because it combines the characteristics of
a search engine and a website which makes it
complexes enough for novice users.
• The participants were a group of 40 university
students gathered by the mean of email
announcements and printed forms.
• The participants were of age 18 to 24 and were
asked to participate in return for a financial
• The tasks were all equally difficult and
independent in order to prevent participants
from getting stuck.
• They were defined to cover the catalogue’s
main search functions.
• Those search functions included the simple
search, advance search, sort results and filter
• Two different questionnaires were given to
the participants. One at the beginning of the
test session and the other at the end.
• The 1st one had questions on the demographic
details of the participants such as age , gender
and education.
• The 2nd one had questions aiming towards
finding out how participants felt about
participating in the experiment.
Processing of the data
Total number of usability problems detected in each
condition was examined. After that, a distinction
was made according to the way the usability
problems had surfaced in the data:
• through observation of the behavioral data
• through verbalization by the participant
• a combination of observation and
Problem Types
• Layout problems: The participant fails to spot a
particular element within a screen of the catalogue;
• Terminology problems: The participant does not
comprehend part(s) of the terminology used in the
• Data entry problems: The participant does not know
how to conduct a search (i.e. enter a search term, use
dropdown windows, or start the actual searching);
• Comprehensiveness problems: The catalogue lacks
information necessary to use it effectively;
• Feedback problems: The catalogue fails to give
relevant feedback on searches conducted.
• 93% of all comments made by CTA
participants corresponded to an observable
problem in their task execution, compared to
54% of the comments of the RTA participants
• Of the 72 problems that were detected, 47%
were reported in both conditions, 31% were
detected exclusively in the CTA condition, and
another 22% were detected exclusively in the
RTA condition.
• This table shows that 89% of all the problem
detections involved problems that were
experienced by participants in both
What this tables show?
• The CTA participants had to verbalize and work at
the same time, which gave them less time to
comment on problems that were not acute.
• While the CTA method reveals more problems
that can be observed during task performance,
the RTA method depends more on the
participants’ verbalizations.
• Verbal protocols in this study do not so much
serve to reveal problems but rather to verbally
support the problems that are otherwise
Task performance
• Does double workload in CTA has an effect on the
participants’ task performance?
• Indicators:
 the successful completion of the seven tasks
 the time it took the participants to complete them
• Result: No significant differences were found.
Participant experiences
• Questions:
 experiences with concurrent or retrospective thinking
 method of working
 presence of the facilitator and the recording equipment
• Result: No significant differences as to how the participants
in both conditions experienced CAT & RAT.
• CTA participants found the test situation less disturbing
than the RTA participants.
• Explanation:
 RTA participants are given more time to fill
in the questionnaire.
 Presence of the facilitator during the first part of the
RTA test (silent task performance) is less functional
than in a CTA design, and that it may be confronting for
participants to see their actions back on video.
 The CTA participants had to actively perform tasks and
think aloud, which considerably reduced the amount of
attention they could spare for noticing the facilitator
and the recording equipment.
• Both methods are comparable in terms of quantitative
output, they differed significantly as to how this output
was established.
• RTA method proved to be more effective in revealing
problems that were not observable, but could only be
detected by means of verbalization.
• RTA participants tended to give explanations and
suggestions, while CTA participants more often limited
themselves to giving descriptions of their actions.
• Very limited contribution of the participant’s
verbalizations to the outcome (in terms of user
problems detected) of the usability test.
• The task of concurrently verbalizing thoughts caused
the participants to make more errors in the process of
task performing and to be less successful in completing
the seven tasks.
• Less successful performance of CTA method lies in the
participant’s workload: the difficulty of the tasks given
to the participants may have been a crucial factor in
this study.
• A strong, and new argument in favor of RTA protocols is
that they may be less susceptible to the influence of
task difficulty, both in terms of reactivity and in terms
of completeness of the verbalizations.

