Measuring the Un-measurable

Tefko Saracevic discussed the distinction of criteria and measure. Please use more concrete examples to elaborate the ideas. If a specific criterion is not measurable, how would you do evaluation?

Saracevic’s paragraph drawing a distinction between criteria and measure is worth thinking about precisely because so many worthwhile evaluation criteria out there are so hard to measure.

Saracevic writes “Criteria refer to chosen standard(s) to judge things by. Criteria are then used to develop measures. (To define the differences by examples: time is a criterion, minute is a measure, and watch is a measuring instrument; relevance is a criterion, precision and recall are measures, and human relevance judgment is a measuring instrument).”

He then proceeds to list dozens of criteria relating to usability and user experience for which it would be quite difficult, if not impossible, to formulate corresponding measures other than “human judgment.” I actually love some of his criteria he lists under the “meta” criterion of usability. How do we measure a DL’s “attractiveness,” “quality of experience,” “effort to understand,” “lostness” or, my favorite, “irritability?”  As Stevie wrote in her post, and as we explored in depth in the previous unit, these things count a lot in an increasingly glossy, visually-captivating and enjoyable-to-navigate World Wide Web.

And Saracevic did find, in his 2004 survey of existing evaluation studies, a number of evaluation methods that would assess these “unmeasurable” criteria by simply observing and questioning human users: surveys, structured interviews, focus groups, observations, think aloud,…

The Reeves, et al. guide Evaluating Digital Libraries: A User-Friendly Guide stresses the importance of these observational evaluation techniques in, I think, a very reasonable way. I liked how they looked to general computer program usability studies like that of Shneiderman (“searching should be a pleasant and rewarding experience”) to inform their guide.

From a usability and user experience standpoints, it’s hard to get very excited about dry quantitative approaches like that of Bollen and Luce.  That sort of evaluation is not worthless but it just cannot assess much about user experience or usability. I can see how it might help designers understand the community of users in a general way.

Then there are the approaches of Choudhury (et al) and Morse who try their best to transform human judgment measures into hard, dry-feeling numbers.

I was left with the impression that funding agencies, as they always do, probably put pressure on DL managers to make their evaluations as dry and quantitative-seeming as possible. I know from experience that the grant/government-funding world often prefers quantitative assessments over qualitative ones.  As in other cases, this potentially hurts managers ability to affect change in areas like usability and user experience.


5 responses to “Measuring the Un-measurable

  1. I like your insights over quan vs. qual. I believe in many scenarios these two should be integrated in A/E process. Results from qual can be inputs to quan, and vice versa.

  2. interesting point about funding agencies putting pressure on DL managers – hadn’t even thought of that! i’m also not surprised that gov funding world leans more towards quantitative vs. qualitative assessments. i agree that this can hurt DL managers when it comes to change…

  3. Brian, you wrote, “I know from experience that the grant/government-funding world often prefers quantitative assessments over qualitative ones.” Is there a reason for this, such as cost?

    • I think it’s cost in part… but I think it’s more this preference for “hard” data when large, sluggish governmental bodies or institutions try to justify their decisions to broad, budget-conscious, constituencies.

      I think about multiple choice testing which is often favored over essays in testing students, or attempts by No Child Left Behind to put numbers on the “success” or “failure” of schools, and I think of several arts grant applications I’ve filled out that require you compute a number of how many audience members you’re grant-funded project will reach.

      I always worry with quantitative measures that they make the performance or projects more understandable to outsiders at the expense of missing aspects of performance only understandable to those with inside understanding.

  4. There is often an inherent bias against qualitative information that compounds the problem of usability data. I am not sure how this can be overcome though.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s