Skip to content

Issues in Technical Communication: Usability Evaluation

A Brief Overview of Usability Evaluation and its Implications on the Field of Technical and Professional Communication

Usability Evaluation: Introduction

A central component of my multi-stage graduate project is iterative usability testing of web-based digital portfolios for K-5 students at Summers-Knoll (SK) School in Ann Arbor, Michigan. Additional testing will be conducted on the online version of the Work Sampling System (WSS), an electronic version of the University of Michigan-designed student performance assessment tool. Teachers at SK currently use WSS in its hardcopy format, and, through implementation of a variety of usability evaluation methods (UEMs), my project will attempt to determine whether a school-wide shift to the online application will better meet the needs of SK’s diverse student and parent community.

A primary and preliminary step in planning my winter 2011 graduate project was conducting dual literature reviews on usability evaluation and digital portfolios/portfolio assessment. The former review provided me an historical context of usability, an introduction to popular contemporary applications, and a survey of current issues and trends, especially in regards to usability testing of web-based applications. In this essay, I will focus on the history, methods, and limitations of usability evaluation as they pertain to the field of technical and professional communication (TPC).

A Brief History of Usability

Usability has been a component of the human experience for many years, and it may possibly predate our current evolutionary categorization. Usability in its simplest terms is one person trying to help someone else improve his or her performance. Research-based studies and efforts to systematically improve human performance began in the early nineteenth century by a group of astronomers attempting to develop a set of best practices in star tracking performance (Bailey, 2005). Usability efforts have since been systematically adapted to inform and shape numerous areas of research and development, and the field is known by a variety of names including:


  • Usability engineering
  • Human factors
  • Human performance engineering
  • Human engineering
  • User experience design
  • Ergonomics

Regardless of what it’s called, usability has become a ubiquitous component of contemporary system design and quality assurance, and Majed and Mayhew refer to it as one of the most important factors in a system’s quality and success. (2009) Despite the prevalence of usability across a diverse array of platforms, industries, and contexts, however, the practice lacks a shared understanding, definition, and vocabulary among scholars, practitioners, and proponents, and Hartson, Andre, and Williges call for a “standard way to describe usability problems [and] for a framework within which usability problem descriptions can be more easily and more directly compared” (2010, p. 159). The International Organization for Standardization (ISO) defines usability as “the extent to which the product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use” (Alshamari & Mayhew, 2009, p. 402). Krug’s simplistic definition of usability as “watching people try to use what you’re creating / designing / building…with the intention of (a) making it easier for people to use or (b) proving that it is easy to use” (2010, p. 13) disregards the usability evaluation methods (UEMs) of cognitive walkthrough (CW) and heuristic evaluation (HE) as well as a variety of automated UEMs employed to determine product compliance with predetermined criteria. Dicks suggests that most usability experts include the following four aspects of usability in their conception and understanding of the term (2002, p. 27):


  1. Easy to learn
  2. Useful
  3. Easy to use
  4. Pleasant to use

In its current iteration, scholars generally agree that usability, when used to describe software evaluation, broadly refers to elements of effectiveness, efficiency, and user satisfaction (Alshamari & Mayhew, 2009). It is this understanding of usability that I will use for the purposes of this essay and for my upcoming graduate research project.

Popular Methods of Usability Evaluation

Software usability evaluation can be broadly divided into two categories: usability testing, which involves users, and usability inspection, which can be conducted independent of users. These two categories of evaluation often work in conjunction with each other, and the lines between them are routinely blurred by the lack of standardized methodologies and definitions.

Usability Testing

Usability testing, that is testing a product or system with users, is perhaps the most commonly employed usability practice among system designers, computer professionals, and technical communicators. The three most prominently used UEMs are CW, HE, and thinking aloud study (TA). Following are brief introductions to these methods. Heuristic recommendations for application can be found among the literature in this essay’s references section.

Cognitive Walkthrough (CW)

According to Hertzum and Jacobson, CW was devised to allow computer professionals to detect usability issues in a user interface based on a detailed specification document, screen mock-ups, or a running system. (2010, p. 185) A typical CW schema involves a specialist designing a hypothetical end-user scenario. The user is described along with tasks he or she is to complete on the system. The specialist further indicates the correct or most efficient sequence of steps leading to task completion that the user would employ. After the scenario is completed, the evaluator asks four questions (Hertzum & Jacobsen, 2010, p. 185):

  1. Will the user try to achieve the right effect?
  2. Will the user notice that the correct action is available?
  3. Will the user associate the correct action with the effect trying to be achieved?
  4. If the correct action is performed, will the user see that progress is being made toward solution of the task?

From the perspective of the hypothetical users, the evaluator determines whether each question leads to success or failure. Failures indicate a usability problem that may be addressed and repaired before conducing usability testing with actual users.

An often-employed extension of the CW is the pluralistic walkthrough (PW). PW modifies the CW design by including users, developers, and usability specialists in completing the original CW tasks. This method allows for interface approach and understanding from a variety of perspectives and may expedite the test-redesign-retest process (Sing & Der-Thanq, 2004).

Heuristic Evaluation (HE)

HE is an informal, group review of an interface. It involves a team of evaluators examining a system and judging its compliance with a set of recognized usability principles, or heuristics. Nielson provides a set of ten general heuristics that state the system should (as cited in Hertzum & Jacobsen, 2010):

  1. Provide visibility of system status
  2. Ensure a match between the system and the real world
  3. Allow for user control and freedom
  4. Be consistent and follow standards
  5. Prevent errors
  6. Utilize recognition rather than recall
  7. Allow for flexibility and efficiency of use
  8. Provide aesthetic and minimalist design
  9. Help users recognize, diagnose, and recover from errors
  10. Provide help and documentation

Recent HE research has included development of non-expert protocols (a sort of HE for dummies, if you will), and advanced use of automated systems to provide more cost-effective HE services. While Dicks warns that HE data generated by automated systems do not constitute usability testing but rather the less valuable usability inspection (2002), automated HE tools canprovide authentic systems monitoring along with valuable quantitative statistics. HE can be classified as usability testing or usability inspection depending on individual HE method design (automated vs. expert-driven), and it is important to recognize the value of each as well as identifying when each may be most appropriate.

Thinking Aloud Study (TA)

TA may be what one generally thinks of when hearing the term “usability testing.” Though numerous variations of TA exist and there is currently no definitive definition of the aim and usage of the method, the fundamental core of TA studies share a similar structure. TA involves a small number of users thinking out loud while solving tasks with a system under development. TA is facilitated by an evaluator who identifies usability problems based on his/her observations of test participants. TA may occur at various times during the development cycle of a system or product, and it is generally agreed that integrating TA into early stages of development may help identify and solve problems in their infancy. Employing TA as an add-on after completion of system development is generally frowned upon, though it’s a practice widely used as a means for cutting development costs. (Though, of course, identifying usability problems post-development has the potential to cancel out any savings associated with non-integration of iterative TA testing.)

A popular variation of TA is the constructive interaction test in which two participants use the system together. This method is thought to be more natural for users, because people are more used to verbalization when they solve problems collaboratively (Sing & Der-Thanq, 2004).

Limitations of Usability Evaluation

Though Krug suggests “testing with one user is 100 percent better than testing with none” (2010, p. 39), Rubin warns “even the most rigorously conducted formal test cannot with 100 percent certainty ensure that a product will be usable when released” (as cited in Dicks, 2002, p. 28). Dicks cites four main reasons why usability testing is not a cure-all (2002, p. 28):

  1. Testing is always an artificial situation.
  2. Test results do not prove that a product works.
  3. Participants are rarely fully representative of the target population.
  4. Testing is not always the best technique to use.

Further, Dicks lists five general categories of misconceptions about usability that threaten the field with increased ambiguity and divergence (2002, p. 26):

  1. Misunderstanding of the concept of usability itself and of the distinctions between usability studies and empirical tests
  2. Misusing statistical results and assuming a set of quantitative statistics equals a usability evaluation
  3. Using usability tests for verification rather than usability
  4. Lack of knowledge of usability methodology limitations
  5. Testing for ease of use but not usefulness

Hertzum and Jacobsen identify three primary shortcomings of UEMs (2010, pp. 196-200):

  1. Vague goal analysis
  2. Vague evaluation procedures
  3. Vague problem criteria

Of these three, vague problem criteria is perhaps the most troubling. Hertzum and Jacobson describe it as “differences in the evaluators’ thresholds regulating when a difficulty or inconvenience becomes a problem” (2010, p. 199). In other words, definitions of terms such as “problem,” “difficulty,” and “challenge” vary across evaluators, further contributing to the lack of standardization issues rampant in the field of usability evaluation and touched on briefly in this essay.

The Evaluator Effect

Perhaps the greatest single problem issue in usability practice is something called the “evaluator effect.” The evaluator effect is when different evaluators testing the same system detect substantially different sets of usability issues (Hertzum & Jacobsen, 2010). In 1998, Jacobsen, Hertzum, and John conducted seminal research on the evaluator effect. The researches had four evaluators individually analyze four usability test sessions. They found that only 20 percent of the 93 detected problems were identified by all four evaluators. Further, they had each evaluator create a list of the ten most severe problems he/she identified in the combined sessions, and none of the selected severe problems appeared on all four evaluators’ lists. They discovered that for all three primary evaluation methods discussed in this paper (CW, HE, and TA), a single evaluator was unlikely to detect the majority of several problems that were detected collectively. The researchers determined that evaluators were using subjective criteria to determine usability problems, and this put into question the reliability of such UEMs as CW, HE, and TA. Nielsen credits the evaluator effect with the problem of evaluators unable to distinguish between usability problems (problems concerning how the system is to be operated) and utility problems (problems concerning what the system can do) (1993). Hertzum and Jacobsen claim the primary culprit is the fact that “usability evaluation is a cognitive activity, which requires that the evaluators exercise judgment” (2010, p. 201).

Research indicates that the evaluator effect persists across differences in:

  • Evaluator experience
  • Evaluator methodology
  • System domain
  • Prototype fidelity
  • Problem severity

While it’s unlikely that the evaluator effect can be completely eliminated from usability testing and inspection, its effects may be reduced by creating standardized usability practices and definitions, two challenges faced by the field and identified across the literature. Further, Hertzum and Jacobsen recommend the following for minimizing the evaluator effect (2010, p. 202):

  1. Be explicit in goal analysis and task selection.
  2. Involve an extra evaluator, at least in critical evaluations.
  3. Reflect on evaluation procedures and problem criteria.

Conclusion: The Future of Usability Evaluation

Despite the many problems inherent in usability evaluation and that fact that, according to Dicks, it “has become so popular that its value is being threatened” (2002, p. 26), none of this should deter scholars and practitioners from continued research and application. Though research-based studies in human performance are 200 years old, application in software systems remains relatively young. Many of the problems with current UEMs are natural in the development of new disciplines and areas of inquiry, and, in spite of all its shortcomings, “user-based laboratory testing can provide a good indication of the types of problems that actually impact users, given a broad enough range of scenarios and appropriate participant heterogeneity”(Hartson et al., p. 177). The success of future scholarship and practice in the field of usability evaluation relies on developing objective criteria for measurement and use of the criteria across industries and platforms. The problem with usability appears to be divergent study and application across a multitude of disparate fields. Heuristics are being developed without concern for cross application. The result is a piecemeal discipline struggling to define itself without an established framework and common language. It’s as if practitioners are all speaking the same tongue, but each has his/her own unique dialect. The field is still awaiting its authoritative publication — its Webster or Strunk and White — to establish the definitive set of standards.

Works Cited

Akpinar, Y., Simsek, H. (2007). Pre-Service Teachers’ Learning Object Development: A Case Study in K-12 Setting.Interdisciplinary Journal of Knowledge and Learning Objects, 3, 197-217. Retrieved from

Alshamari, M., Mayhew, P. (2009). Technical Review: Current Issues of Usability Testing. IETE Technical Review, 26.6, 402-406. doi:10.4103/0256-4602.57825

Bailey, B. (2003). Evaluating the ‘Evaluator Effect.’ Retrieved from

Baily, B. (2005). Usability 101 (online lecture). Retrieved from

Cobus, L., Dent, V.F., Ondrusek, A. (2005). How Twenty-Eight Users Helped Redesign an Academic Library Web Site.Reference & User Services Quarterly, 44.3, 232-246. Retrieved from

Dicks, R.S. (2002). Mis-Usability: On the Uses and Misuses of Usability Testing. Proceedings of the 20th Annual International Conference on Computer Documentation, 26-30. doi:10.1145/584955.584960

Hartson, H.R., Andre, T.S., Williges, R.C. (2003). Criteria for Evaluating Usability Evaluation Methods. International Journal of Human-Computer Interaction, 15.1, 145-181. doi:10.1207/S15327590IJHC1501_13

Hertzum, M., Jacobsen, N.E. (2003). The Evaluator Effect: A Chilling Fact About Usability Evaluation Methods. International Journal of Human-Computer Interaction, 15.1, 183-204. doi:10.1207/S15327590IJHC1501_14

Krug, S. (2010). Rocket Surgery Made Easy: The Do-It-Yourself Guide to Finding and Fixing Usability Problems. Berkeley, CA: New Riders

Nielsen J. (1993). Usability Engineering. Boston: Academic.

Scowen, G., Regenbrecht, H. (2009). Increased Popularity Through Compliance With Usability Guidelines in E-Learning Web Sites. International Journal of Information Technology and Web Engineering, 4.3, 38-57. doi:10.4018/jitwe.2009100603

Sing, C.C., Der-Thanq, V.C. (2004). A Review on Usability Evaluation Methods for Instructional Multimedia: An Analytical Framework. International Journal of Instructional Media, 31.3, 229-238. Retrieved from

Skov, M.B., Stage, J. (2009). A Conceptual Tool for Usability Problem Identification in Website Development. International Journal of Information Technology and Web Engineering, 4.4, 22-35. doi:10.4018/jitwe.2009100102

Teoh, K.K., Ong, T.S., Lim, P.W., Liong, R.P.Y., Yap, C.Y. (2009). Explorations of Web Usability. American Journal of Applied Sciences, 6.3, 424-429. Retrieved from


No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: