Tefko Saracevic discussed the distinction of criteria and measure. Please use more concrete examples to elaborate the ideas. If a specific criterion is not measurable, how would you do evaluation?
Saracevic’s paragraph drawing a distinction between criteria and measure is worth thinking about precisely because so many worthwhile evaluation criteria out there are so hard to measure.
Saracevic writes “Criteria refer to chosen standard(s) to judge things by. Criteria are then used to develop measures. (To define the differences by examples: time is a criterion, minute is a measure, and watch is a measuring instrument; relevance is a criterion, precision and recall are measures, and human relevance judgment is a measuring instrument).”
He then proceeds to list dozens of criteria relating to usability and user experience for which it would be quite difficult, if not impossible, to formulate corresponding measures other than “human judgment.” I actually love some of his criteria he lists under the “meta” criterion of usability. How do we measure a DL’s “attractiveness,” “quality of experience,” “effort to understand,” “lostness” or, my favorite, “irritability?” As Stevie wrote in her post, and as we explored in depth in the previous unit, these things count a lot in an increasingly glossy, visually-captivating and enjoyable-to-navigate World Wide Web.
And Saracevic did find, in his 2004 survey of existing evaluation studies, a number of evaluation methods that would assess these “unmeasurable” criteria by simply observing and questioning human users: surveys, structured interviews, focus groups, observations, think aloud,…
The Reeves, et al. guide Evaluating Digital Libraries: A User-Friendly Guide stresses the importance of these observational evaluation techniques in, I think, a very reasonable way. I liked how they looked to general computer program usability studies like that of Shneiderman (“searching should be a pleasant and rewarding experience”) to inform their guide.
From a usability and user experience standpoints, it’s hard to get very excited about dry quantitative approaches like that of Bollen and Luce. That sort of evaluation is not worthless but it just cannot assess much about user experience or usability. I can see how it might help designers understand the community of users in a general way.
Then there are the approaches of Choudhury (et al) and Morse who try their best to transform human judgment measures into hard, dry-feeling numbers.
I was left with the impression that funding agencies, as they always do, probably put pressure on DL managers to make their evaluations as dry and quantitative-seeming as possible. I know from experience that the grant/government-funding world often prefers quantitative assessments over qualitative ones. As in other cases, this potentially hurts managers ability to affect change in areas like usability and user experience.