Evaluation Notes
Module 5: The Evaluation
What is Evaluation?
- Evaluation is the structured interpretation and giving of meaning to predicted or actual impacts of proposals or results.
- It examines original objectives and what was predicted or accomplished, along with how it was accomplished.
- It tests usability and functionality of a system.
Key Questions in Evaluation
- What to evaluate?
- When to evaluate?
- Why evaluate?
- How to evaluate?
- Where to evaluate?
Types of Evaluation
- Formative: Occurs during the development of a concept, proposal, project, or organization to improve its value or effectiveness.
- Summative: Draws lessons from a completed action, project, or organization at a later point in time.
Goals of Evaluation (Why Evaluate?)
- Assess the extent of system functionality.
- Assess the effect of the interface on the user.
- Identify specific problems.
Evaluation Process
- Tests usability and functionality of a system.
- Occurs in laboratory, field, and/or in collaboration with users.
- Evaluates both design and implementation.
- Should be considered at all stages in the design life cycle.
Approaches & Methods
- Usability testing
- Consistency in navigation structure
- Use of terms
- System responds
- User's performance
- Techniques to obtain data: record, questionnaires, interview
- Field study
- Natural setting
- Identify opportunity for new technology
- Establish requirements
- Facilitate the introduction of technology
- Evaluate technology
- Techniques: recorded audio and video, interview, observation
- Analytical evaluation
- Inspection
- Heuristic evaluation
- Walk through
- Theoretically based model to predict user performance
- Combining approaches
- Field study to evaluate initial design & get early feedback
- Make some design changes
- Usability test to check specific design features
- Field study to see what happens when used in natural environment
- Make some final design changes
Evaluating Designs
- Cognitive Walkthrough
- Heuristic Evaluation
- Review-based evaluation
Cognitive Walkthrough
- Proposed by Polson et al.
- Evaluates design on how well it supports user in learning task.
- Usually performed by expert in cognitive psychology.
- Expert 'walks through' design to identify potential problems using psychological principles.
- Forms used to guide analysis.
- For each task walkthrough considers:
- What impact will interaction have on user?
- What cognitive processes are required?
- What learning problems may occur?
- Analysis focuses on goals and knowledge: does the design lead the user to generate the correct goals?
Heuristic Evaluation
- Proposed by Nielsen and Molich.
- Usability criteria (heuristics) are identified.
- Design examined by experts to see if these are violated.
- Example heuristics:
- System behavior is predictable
- System behavior is consistent
- Feedback is provided
- Heuristic evaluation 'debugs' design.
- Areas of focus: Help, Feedback, and Tolerance; Home Page Usability; Task Orientation; Trust & Credibility; Navigation & Layout; Writing & Content Quality; Search Usability; Page Layout & Visual Design; Forms & Data Entry.
Review-based evaluation
- Results from the literature used to support or refute parts of design.
- Care needed to ensure results are transferable to new design.
Model-based evaluation
- Cognitive models used to filter design options e.g. GOMS prediction of user performance.
- Design rationale can also provide useful evaluation information.
Evaluating through user Participation
Laboratory studies
- Advantages:
- Specialist equipment available
- Uninterrupted environment
- Disadvantages:
- Lack of context
- Difficult to observe several users cooperating
- Appropriate:
- If system location is dangerous or impractical for constrained single user systems to allow controlled manipulation of use
Field Studies
- Advantages:
- Natural environment
- Context retained (though observation may alter it)
- Longitudinal studies possible
- Disadvantages:
- Appropriate:
- Where context is crucial for longitudinal studies
Evaluating Implementations
Experimental evaluation
- Controlled evaluation of specific aspects of interactive behavior
- Evaluator chooses hypothesis to be tested
- A number of experimental conditions are considered which differ only in the value of some controlled variable.
- Changes in behavioral measure are attributed to different conditions
Experimental factors
- Subjects
- who - representative, sufficient sample
- Variables
- things to modify and measure
- Hypothesis
- Experimental design
- how you are going to do it
Variables
- Independent variable (IV): Characteristic changed to produce different conditions, e.g., interface style, number of menu items.
- Dependent variable (DV): Characteristics measured in the experiment, e.g., time taken, number of errors.
Hypothesis
- Prediction of outcome
- Framed in terms of IV and DV, e.g., "error rate will increase as font size decreases"
- Null hypothesis: States no difference between conditions; aim is to disprove this, e.g., null hyp. = "no change with font size"
Experimental design
- Within groups design:
- Each subject performs experiment under each condition.
- Transfer of learning possible
- Less costly and less likely to suffer from user variation.
- Between groups design:
- Each subject performs under only one condition
- No transfer of learning
- More users required
- Variation can bias results.
Analysis of data
- Before you start to do any statistics:
- look at data
- save original data
- Choice of statistical technique depends on:
- type of data
- information required
- Type of data
- discrete - finite number of values
- continuous - any value
- What information is required?
- is there a difference?
- how big is the difference?
- how accurate is the estimate?
- Parametric and non-parametric tests mainly address first of these.
Experimental studies on groups
- More difficult than single-user experiments
- Problems with:
- subject groups
- choice of task
- data gathering
- analysis
Observational Methods
Interviews
- Analyst questions user on one-to-one basis
- Usually based on prepared questions
- Informal, subjective, and relatively cheap
- Advantages:
- Can be varied to suit context
- Issues can be explored more fully
- Can elicit user views and identify unanticipated problems
- Disadvantages:
- Very subjective
- Time consuming
Questionnaires
- Set of fixed questions given to users
- Advantages:
- Quick and reaches large user group
- Can be analyzed more rigorously
- Disadvantages:
- Less flexible
- Less probing
Questionnaires (ctd)
- Need careful design.
- what information is required?
- how are answers to be analyzed?
- Styles of question
- general
- open-ended
- scalar
- multi-choice
- ranked
Physiological methods
Eye tracking
- Head or desk mounted equipment tracks the position of the eye.
- Eye movement reflects the amount of cognitive processing a display requires
- Measurements include:
- Fixations: eye maintains stable position. Number and duration indicate level of difficulty with display
- Saccades: rapid eye movement from one point of interest to another
- Scan paths: moving straight to a target with a short fixation at the target is optimal
Physiological measurements
- Emotional response linked to physical changes
- These may help determine a user's reaction to an interface
- Measurements include:
- Heart activity, including blood pressure, volume and pulse.
- Activity of sweat glands: Galvanic Skin Response (GSR)
- Electrical activity in muscle: electromyogram (EMG)
- Electrical activity in brain: electroencephalogram (EEG)
- Some difficulty in interpreting these physiological responses - more research needed