Evaluation Notes

Module 5: The Evaluation

What is Evaluation?

  • Evaluation is the structured interpretation and giving of meaning to predicted or actual impacts of proposals or results.
  • It examines original objectives and what was predicted or accomplished, along with how it was accomplished.
  • It tests usability and functionality of a system.

Key Questions in Evaluation

  • What to evaluate?
  • When to evaluate?
  • Why evaluate?
  • How to evaluate?
  • Where to evaluate?

Types of Evaluation

  • Formative: Occurs during the development of a concept, proposal, project, or organization to improve its value or effectiveness.
  • Summative: Draws lessons from a completed action, project, or organization at a later point in time.

Goals of Evaluation (Why Evaluate?)

  • Assess the extent of system functionality.
  • Assess the effect of the interface on the user.
  • Identify specific problems.

Evaluation Process

  • Tests usability and functionality of a system.
  • Occurs in laboratory, field, and/or in collaboration with users.
  • Evaluates both design and implementation.
  • Should be considered at all stages in the design life cycle.

Approaches & Methods

  • Usability testing
    • Consistency in navigation structure
    • Use of terms
    • System responds
    • User's performance
    • Techniques to obtain data: record, questionnaires, interview
  • Field study
    • Natural setting
    • Identify opportunity for new technology
    • Establish requirements
    • Facilitate the introduction of technology
    • Evaluate technology
    • Techniques: recorded audio and video, interview, observation
  • Analytical evaluation
    • Inspection
    • Heuristic evaluation
    • Walk through
    • Theoretically based model to predict user performance
  • Combining approaches
    • Field study to evaluate initial design & get early feedback
    • Make some design changes
    • Usability test to check specific design features
    • Field study to see what happens when used in natural environment
    • Make some final design changes

Evaluating Designs

  • Cognitive Walkthrough
  • Heuristic Evaluation
  • Review-based evaluation

Cognitive Walkthrough

  • Proposed by Polson et al.
  • Evaluates design on how well it supports user in learning task.
  • Usually performed by expert in cognitive psychology.
  • Expert 'walks through' design to identify potential problems using psychological principles.
  • Forms used to guide analysis.
  • For each task walkthrough considers:
    • What impact will interaction have on user?
    • What cognitive processes are required?
    • What learning problems may occur?
  • Analysis focuses on goals and knowledge: does the design lead the user to generate the correct goals?

Heuristic Evaluation

  • Proposed by Nielsen and Molich.
  • Usability criteria (heuristics) are identified.
  • Design examined by experts to see if these are violated.
  • Example heuristics:
    • System behavior is predictable
    • System behavior is consistent
    • Feedback is provided
  • Heuristic evaluation 'debugs' design.
  • Areas of focus: Help, Feedback, and Tolerance; Home Page Usability; Task Orientation; Trust & Credibility; Navigation & Layout; Writing & Content Quality; Search Usability; Page Layout & Visual Design; Forms & Data Entry.

Review-based evaluation

  • Results from the literature used to support or refute parts of design.
  • Care needed to ensure results are transferable to new design.

Model-based evaluation

  • Cognitive models used to filter design options e.g. GOMS prediction of user performance.
  • Design rationale can also provide useful evaluation information.

Evaluating through user Participation

Laboratory studies

  • Advantages:
    • Specialist equipment available
    • Uninterrupted environment
  • Disadvantages:
    • Lack of context
    • Difficult to observe several users cooperating
  • Appropriate:
    • If system location is dangerous or impractical for constrained single user systems to allow controlled manipulation of use

Field Studies

  • Advantages:
    • Natural environment
    • Context retained (though observation may alter it)
    • Longitudinal studies possible
  • Disadvantages:
    • Distractions
    • Noise
  • Appropriate:
    • Where context is crucial for longitudinal studies

Evaluating Implementations

Experimental evaluation

  • Controlled evaluation of specific aspects of interactive behavior
  • Evaluator chooses hypothesis to be tested
  • A number of experimental conditions are considered which differ only in the value of some controlled variable.
  • Changes in behavioral measure are attributed to different conditions

Experimental factors

  • Subjects
    • who - representative, sufficient sample
  • Variables
    • things to modify and measure
  • Hypothesis
    • what you'd like to show
  • Experimental design
    • how you are going to do it

Variables

  • Independent variable (IV): Characteristic changed to produce different conditions, e.g., interface style, number of menu items.
  • Dependent variable (DV): Characteristics measured in the experiment, e.g., time taken, number of errors.

Hypothesis

  • Prediction of outcome
  • Framed in terms of IV and DV, e.g., "error rate will increase as font size decreases"
  • Null hypothesis: States no difference between conditions; aim is to disprove this, e.g., null hyp. = "no change with font size"

Experimental design

  • Within groups design:
    • Each subject performs experiment under each condition.
    • Transfer of learning possible
    • Less costly and less likely to suffer from user variation.
  • Between groups design:
    • Each subject performs under only one condition
    • No transfer of learning
    • More users required
    • Variation can bias results.

Analysis of data

  • Before you start to do any statistics:
    • look at data
    • save original data
  • Choice of statistical technique depends on:
    • type of data
    • information required
  • Type of data
    • discrete - finite number of values
    • continuous - any value
  • What information is required?
    • is there a difference?
    • how big is the difference?
    • how accurate is the estimate?
  • Parametric and non-parametric tests mainly address first of these.

Experimental studies on groups

  • More difficult than single-user experiments
  • Problems with:
    • subject groups
    • choice of task
    • data gathering
    • analysis

Observational Methods

Interviews

  • Analyst questions user on one-to-one basis
  • Usually based on prepared questions
  • Informal, subjective, and relatively cheap
  • Advantages:
    • Can be varied to suit context
    • Issues can be explored more fully
    • Can elicit user views and identify unanticipated problems
  • Disadvantages:
    • Very subjective
    • Time consuming

Questionnaires

  • Set of fixed questions given to users
  • Advantages:
    • Quick and reaches large user group
    • Can be analyzed more rigorously
  • Disadvantages:
    • Less flexible
    • Less probing

Questionnaires (ctd)

  • Need careful design.
    • what information is required?
    • how are answers to be analyzed?
  • Styles of question
    • general
    • open-ended
    • scalar
    • multi-choice
    • ranked

Physiological methods

Eye tracking

  • Head or desk mounted equipment tracks the position of the eye.
  • Eye movement reflects the amount of cognitive processing a display requires
  • Measurements include:
    • Fixations: eye maintains stable position. Number and duration indicate level of difficulty with display
      • Saccades: rapid eye movement from one point of interest to another
    • Scan paths: moving straight to a target with a short fixation at the target is optimal

Physiological measurements

  • Emotional response linked to physical changes
  • These may help determine a user's reaction to an interface
  • Measurements include:
    • Heart activity, including blood pressure, volume and pulse.
    • Activity of sweat glands: Galvanic Skin Response (GSR)
    • Electrical activity in muscle: electromyogram (EMG)
    • Electrical activity in brain: electroencephalogram (EEG)
  • Some difficulty in interpreting these physiological responses - more research needed