Skip to content

Annotated Bibliography: Usability Evaluation

Akpinar, Y., Simsek, H. (2007). Pre-Service Teachers’ Learning Object Development: A Case Study in K-12 Setting. Interdisciplinary Journal of Knowledge and Learning Objects, 3, 197-217. Retrieved from

Based on the results of a pre-service teacher usability study conducted at Bogazici University, Istanbul, Turkey, Akpinar and Simsek’s essay focuses on participant ease of use with designing and developing web-based learning objectives (LOs) using a learning content management system (LCMS). The authors define an LO as “an independent collection of content and media elements” whose primary value or utility is dependent on the instructional contexts in which they exist (p. 198). The authors look specifically at LO development in K-12 environments and the instructional software tools used by teacher-developers. After a literature review that identifies popular LO tools used in Australia, Europe, New Zealand, and the United States, as well as best practices in interactive, e-learning design, the researchers conduct usability tests on an emerging LCMS product called BU-LeCoMaS. The subjects of the research are 76 student teachers from a variety of specialized backgrounds with varying levels of LO design competencies. Data collection methodologies include Likert-type questionnaires, open-ended questionnaires, and artifact analyses.

This study is relevant to my field of inquiry by focusing exclusively on usability among K-12 educators. Especially helpful is the authors’ comparison of teacher-developed LOs to research-generated best practices. Perhaps most exciting for my needs/interests is the discovery that, with this particular LCMS, there were no significant differences among novice and more experienced LO developers. This indicates that advanced technologies can be user friendly based on the ways in which they are framed or presented to end users.



Alshamari, M., Mayhew, P. (2009). Technical Review: Current Issues of Usability Testing. IETE Technical Review, 26.6, 402-406. doi:10.4103/0256-4602.57825

Alshamari and Mayhew’s brief article provides a concise overview of current issues in usability testing. The authors state straightaway that “usability is one of the most important success factors in system quality” (p. 402) and follow with several experts’ definitions of usability, summarizing that most of these emphasize efficiency, effectiveness, and user satisfaction. They further offer a survey of usability evaluation methods and an introductory explanation of the “evaluator effect,” a term that recurs in some of the literature I’ve perused. In a nutshell, the evaluator effect refers to the limitations of the usability test facilitator in detecting problems/issues based on his/her interpretations of data. The essay highlights other key areas of contemporary usability testing research, including the ongoing debate of the number of users needed to generate reliable data; how to prioritize identified user problems; how to measure abstract concepts like efficiency, effectiveness, and user satisfaction; and the limitations of usability testing of web-based products.

I find this article helpful for my research by providing snapshot views of current scholarly inquiry and debate. Offering perspectives from experts in usability research — Dicks, Hertzum, Jacobsen, Krug, Miller, Nielsen, etc. — provides a nice framework for conducting my own literature review, and the usability statistics the authors provide, particularly in usability testing problems, offer subtopics to help guide my research and ensure its timeliness.



Cobus, L., Dent, V.F., Ondrusek, A. (2005). How Twenty-Eight Users Helped Redesign an Academic Library Web Site. Reference & User Services Quarterly, 44.3, 232-246. Retrieved from

Cobus, Dent, and Ondrusek conducted a usability study of Hunter College Library’s web site. A subsidiary of CUNY with campuses throughout Manhattan, Brooklyn, Queens, and Staten Island, Hunter College’s librarians and graduate assistants tested their site for ease of use and clarity of purpose and involved student users directly in the site redesign process. This essay details the testers’ data collection methodologies based on prior research by field experts Krug, Nielsen, and Rubin. Interestingly, the Hunter team chose a pool of 28 participants in their study, as opposed to Krug and Nielson’s recommended three to five, based on Rubin’s research indicating that a group of eight to ten testers uncovers 80 to 100 percent of a web site’s problem areas. Further, using a larger test audience, per the pre-existing research and this essay’s authors’ findings, allows for better identification of data trends. Per Krug’s recommendations, the Hunter team conducted usability tests in phases, or rounds, allowing for ongoing test modification and greater data analysis methodology experimentation. The authors report shifting from a predominantly quantitative approach to a more qualitative approach as they moved through the study phases. This shift was based on end user feedback and direct involvement of participants in the test design. They conclude by suggesting that reiterative usability testing and modification be ongoing for dynamic research web sites: “As user needs change, so must the site” (p. 242).

This essay provides a helpful, in-depth study of a usability scenario in an educational context, much of which is applicable to the usability tests I’m preparing to conduct with K-5 educators and parents in winter 2011. Cobus, Dent, and Ondrusek offer best practices gleaned from literature reviews and experience that can be directly applied to the design of my own studies, including how to measure the look and feel and perceived ethos of an educational institution’s web site.



Dicks, R.S. (2002). Mis-Usability: On the Uses and Misuses of Usability Testing. Proceedings of the 20th Annual International Conference on Computer Documentation, 26-30. doi:10.1145/584955.584960

Originally presented at the 2002 International Conference on Computer Documentation in Toronto, Dicks’s brief and passionate (and at times very funny) essay serves as a warning of the increasing misunderstanding and misuse of usability testing. His central thesis is best summed up in the paper’s abstract:

“Usability has become so popular that its value is being threatened by misuse of the term and by misunderstanding about important distinctions between usability studies and empirical usability testing, between usability and verification tests, between ease of use and usefulness” (p. 26).

Dicks provides a brief introduction to automated usability testing software — very interesting to me as I didn’t know this type of thing exists — and how such software, as well as some face-to-face “testing” conducted by many technical communication practitioners, offers nothing more than quantitative user monitoring data. True usability, based on Gould and Lewis’s four criteria — easy to learn, useful, easy to use, pleasant to use — can only be measured through analysis of both quantitative and qualitative data. He further discusses the limitations of usability — for example testing always occurs in an artificial situation — and warns of broadly applying usability findings to larger contexts. (For example, if three out of four usability participants experience difficulty completing a particular task on a web site under study, the facilitator cannot claim that 75 percent of potential users will experience the same problem.)

Dicks’s report will prove valuable in designing my own usability tests. I’ve read other reports of his, and I value his expertise and advice. This essay provides a nice framework through which I might filter future usability designs to be sure they’re free of the common snags and pitfalls Dicks describes.



Hartson, H.R., Andre, T.S., Williges, R.C. (2003). Criteria for Evaluating Usability Evaluation Methods. International Journal of Human-Computer Interaction, 15.1, 145-181. doi:10.1207/S15327590IJHC1501_13

Written by Hartson and Williges, professors of computer science and industrial and systems engineering at Virginia Tech, and Andre from the Air Force Research Laboratory, this detailed report uniquely targets usability evaluation methods (UEMs) in the field of human-computer interaction (HCI). The authors first list and then provide intricate detail of a variety of popular UEMs including heuristic evaluation, cognitive walk-through, usability walk-though, formal usability inspection, and heuristic walk-though. Each of these methods, explain the researchers, is capable of providing authentic and applicable formative and summative usability data in a variety of contexts, both analytic and empirical. The problem is the lack of cross-platform standards — definitions, measures, metrics, processes for evaluation, etc. — that renders comparison of UEMs impossible. The gentlemen offer a heuristic for establishing such standards that includes criteria of thoroughness, validity, effectiveness, reliability, downstream utility, and cost effectiveness.

This study is singular among the documents I’ve chosen for my literature review in focusing on methods and design of usability evaluation rather than on application and/or outcomes of usability testing. Though the perspectives of and recommendations from these research engineers is beyond the scope of my inquiry (and, often, understanding), this report nonetheless offers an historical context that provides a framework or background for my usability research and application. The methods I employ in my own usability testing will certainly be informed by the foundational research conducted by these scholars.



Hertzum, M., Jacobsen, N.E. (2003). The Evaluator Effect: A Chilling Fact About Usability Evaluation Methods. International Journal of Human-Computer Interaction, 15.1, 183-204. doi:10.1207/S15327590IJHC1501_14

Hertzum and Jacobsen define the usability phenomenon known as “the evaluator effect” as the “differences in evaluators’ problem detection and severity ratings” (p. 184). The principle reason for the effect, argue the authors, is because usability evaluation requires interpretation and evaluator judgment in analyzing interactions among users, their tasks, and the systems with which they’re interacting. They point to evidence of the same effect in more mature cognitive activities such as document indexing and medical diagnosing. Individual differences among usability evaluators, they write, “preclude that cognitive activities such as detecting and assessing usability problems are completely consistent across evaluators” (p. 195).

The authors’ research is grounded in three previously-conducted studies, each employing a different usability evaluation method (UEM): cognitive walkthrough (CW), heuristic evaluation (HE), and thinking-aloud study (TA). After an analysis of the three published studies and identification of unique evaluator effects inherent in each UEM, Hertzum and Jacobsen offer formulae for measuring the evaluator effect and suggest culprits of all three methods’ primary shortcomings: vague goal analysis, vague evaluation procedures (including anchoring, wherein the test user is too similar to the evaluator to be representative of actual users), and vague problem criteria (essentially the subjectivity of evaluator-determined user problems). Of special interest is the pairs’ reference to Nielsen’s 1993 distinguishing between usability problems (problems concerning how the system is to be operated) and utility problems (problems concerning what the system is capable of doing). Tips for practitioners include being explicit on goal analysis and task selection, involving an extra evaluator in usability tests, and reflecting on evaluation procedures and problem criteria.



Scowen, G., Regenbrecht, H. (2009). Increased Popularity Through Compliance With Usability Guidelines in E-Learning Web SitesInternational Journal of Information Technology and Web Engineering, 4.3, 38-57. doi:10.4018/jitwe.2009100603

The central thesis of Scowen and Regenbrecht’s study is “increased compliance with usability guidelines does have a correlation with increased popularity of a web site” (p. 50). Using 38 English-language e-learning web sites for their test pool, the authors applied 28 usability guidelines culled from the US Department of Health and Human Services master list of 209 web site usability recommendations (which are based on ISO9241’s definition of usability as “the effectiveness, efficiency, and satisfaction with which specified users achieve specified goals in particular environments” (p. 39).) The authors reference usability guidelines-compliance software employed by many popular web sites that performs automated evaluation of adherence to best practices (i.e. vertical scrollbars and the use of breadcrumbs to enhance user orientation). They elect not to use such software due to cost restraints and limitations. The results of the study indicate a correlation between adhering to usability guidelines and popularity based on the authors’ criteria for defining “popular,” an amalgam of measures including number of links pointing to a site from a variety of search engines and the Google PageRank of each site.

This essay is a relative light read but with serious implications for designers of sites aimed at attracting and maintaining loyal user bases. Most important for me is understanding the practicality of automated usability guidelines software. Previous reference to such software in the literature I’ve reviewed suggested it is intended as a replacement for authentic usability testing, but I now understand that it can be used in conjunction with other usability efforts like thinking-aloud research (TA). Further, though ultimately limited in ability, automated applications can be legitimately used as stand alone applications for determining compliance to certain, generally accepted best practices. Technical communicators must be sure, though, not to mis-label such activity as “usability.”



Sing, C.C., Der-Thanq, V.C. (2004). A Review on Usability Evaluation Methods for Instructional Multimedia: An Analytical Framework. International Journal of Instructional Media, 31.3, 229-238. Retrieved from

Sing and Der-Thanq, based in Singapore and New Zealand, begin their report with a concise summation of the objective of instructional multimedia products:

“One of the primary tasks of the interface designer is the creation of a user interface that is motivating, interactive, intuitive, and, at the same time, imposing as minimal as possible a cognitive load on the learner. To achieve this goal, qualities such as learnability, efficiency, memorability, low error rates, and high user satisfaction are essential.” (p. 229).

They divide usability evaluation methods into two groups: usability testing, which involves users, and usability inspection, which can be conducted independent of users. Their description of thinking-aloud testing (TA) is unique in that it suggests the potential caveat that the “technique is limited by the capability of verbalization of the test user” and that “the test user’s verbalization might interfere with the task they are performing” (p. 231). These are excellent considerations I’d not considered nor seen addressed in other literature. They further mention a subcategory of TA testing known as constructive interaction wherein two participants use the system together. They suggest this method produces more authentic data as people are more used to verbalizing when solving problems collaboratively. Interestingly, Sing and Thanq suggest employment of usability inspection methods during early design stages (formative evaluation) and usability testing solely for summative evaluation. This is in stark contrast to all other literature I’ve reviewed that strongly recommends iterative usability testing (and inspection) from product conception to completion. Though the authors raise some interesting ideas in their brief essay, particularly the concept of constructive interaction, I fear following their recommendations might lead to problems with cost effectiveness and downstream utility. Additionally, though suggesting at the beginning of the essay that their evaluation methods could be used for evaluating instructional multimedia, a specific interest of mine, their study lacks direct application to a real or hypothetical situation rendering the inclusion of “instructional multimedia” in the essay’s title misleading.



Skov, M.B., Stage, J. (2009). A Conceptual Tool for Usability Problem Identification in Website DevelopmentInternational Journal of Information Technology and Web Engineering, 4.4, 22-35. doi:10.4018/jitwe.2009100102

In support of a one-page conceptual tool designed by Skov and Stage to augment novice identification of usability problems, the authors chose 28 undergraduate students at Aalborg University, Denmark, to try out their heuristic evaluation (HE) product. The researchers hypothesized that data generated by novice HE would be comparable to data obtained by employing usability experts. If their outcomes supported their theory, they would be provided with rich information for marketing their usability tool, perhaps contributing to the mainstreaming of HE much in the way usability testing has been streamlined and democratized by Krug.

The results of their empirical study found that HE performed on a web site by non-professionals identified 72 percent of the problems found by experts, and 86 percent of the most critical problems were identified by the novice group. These findings could have implications for increased integration of HE into early stages of document and interface design. Limitations of the study include identified incidents of the evaluator effect among the student participants and no other similar studies available for comparison.

This is an interesting report, and I applaud the authors’ attempt to democratize HE as some evaluation is certainly better than none. I have some problems with the interchangeable usage of “usability testing” with what other researchers indicate is “usability inspection.” I find the lack of standard vocabulary across the literature problematic and worth further inquiry into discipline/industry standardization.



Teoh, K.K., Ong, T.S., Lim, P.W., Liong, R.P.Y., Yap, C.Y. (2009). Explorations of Web Usability. American Journal of Applied Sciences, 6.3, 424-429. Retrieved from

Teoh and Ong define web site usability testing as a necessary step for improving the performance and experience of web users and leading to better overall systems design. They identify the seven most important criteria for web usability evaluation as screen appearance, consistency, web accessibility, navigation, media use, interactivity, and content.

For their study, the authors chose the homepage for The Center of Biometrics and Bioinformatics at Multimedia University, Melaka, Malaysia. Their quantitative research method involved pre- and post-design change surveying of end users using Likert scale questionnaires. The results indicate overall user approval of web site modifications and the ranking of the seven usability criteria by users. (Content rated highest; consistency was scored as least important).

This brief essay will not inform my research as Teoh and Ong demonstrate a profound lack of understanding of elemental concepts and constructs of usability testing, many of which are rooted in experimental psychology and aligned with two decades’ worth of research and direct application. Their study involving pre- and post-surveys of participants is more a user satisfaction survey that would perhaps be more appropriate for a marketing research study. Further, the authors fail to employ multiple methods necessary for generating meaningful data including validation via triangulation.

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: