User Studies

Framework

Design has a specific meaning in the context of studies/experiments

Ethics

  • Any study with human participation requires an ethics review

    • Any form of study that collects data from humans

    • Including online surveys

  • By conducting research ethics reviews and adhering to guidelines we ensure that participation is

    • Voluntary - do not coerce people into taking part

    • Informed - provide accurate information about what the study entails

    • Safe - do not expose participants to unsafe situations or practices

Framework

Informed Consent

  • You need to obtain informed consent from each participant

    • BEFORE the study begins

  • Participants must be able to make an informed decision

  • You must provide participants with an information sheet, time to read and opportunity to ask

  • Participants must provide explicit consent before you proceed

Study Objectives

What purpose do user studies have?
Some typical evaluation goals:

  • Comparing products - which is better suited for the tasks we have?

  • Completing transactions - how easy or difficult are they to complete?

  • Frequently used systems - how efficient is the interface for experts?

  • Navigation - can people find the information and features they need?

  • Safety-critical interface - can users operate the interface without error?

  • Creating an overall positive experience - do users respond positively?

  • Impact of subtle changes - how does it affect user behaviour?

Criteria

What criteria are relevant for a given purpose?
General usability criteria

  • Effectiveness – Can users accomplish their goals with the interface? How easy or demanding is task completion?

  • Efficiency – How quickly and accurately can users complete the tasks? How much effort is involved?

  • Satisfaction – How satisfied are users with their experience of using the system?
    Other and more specific criteria might be important

Experiment Variables

Independent variables = Factors

  • Something that is manipulated or systematically controlled

  • In controlled experiments with users, these are called factors

  • Each combination of factor and level defines a test condition

Dependent variables = Measurements

  • Something we measure in an experiment

  • In user studies - a human behaviour or response

    • Performance measures

    • User-reported measures

Identify Factors and Measures

  • What are the factors and conditions that you want to study and compare?

  • Focus on one factor if possible (keep it simple)

  • More factors make it harder to determine cause-effect relationships

  • But sometimes, we are interested in the interaction between factors

    • Example - can people select targets faster by touch or by eye gaze on a touchscreen? does that depend on the size of the target?

  • Choose measures in accordance with your objectives and criteria

  • In experiments, generally aim for small number of conditions and large number of repetitions

Levels of Measurement

  • Anything we measure in a study is a variable

  • Each possible value of a variable is an attribute

  • The level of measurement of a variable is defined by the relationship among the attributes

    • Nominal - attributes have names but no relationship

    • Ordinal - attributes are ordered

    • Interval - the distance between the possible values is meaningful

    • Ratio - ratios (multiples, percentages) are meaningful

  • Levels of measurement define different types of data

Study Design

Once we know what our variables are, we can consider the design of the study. This includes:

  • Design (= structure of experiment, but just referred to as the design)

    • Factorial design – how the experiment is structured by factors and levels

    • Participant grouping – one or more groups

  • Experimental setup, referred to as apparatus

    • Hardware and software, spatial arrangement of participant and devices

  • Tasks and Procedure

    • The tasks participants are asked to complete

    • Sequence of events in the study

Within-Subjects or Between-Subjects

Within-subjects design

  • Each participant performs the same tasks with each of the test conditions
    Between-subjects design

  • Participants are put into groups that each use different test conditions
    Some variables (factors) require a between-subjects design

Tasks

  • Central to usability tests and user studies

    • If there is no task than what you do is neither a test nor an experiment

  • In usability tests, users are given typical tasks that users would perform with the user interface, to find out whether they encounter problems

  • In user studies that measure performance, we have a trade-off

    • Use typical tasks -> representative of real application

    • Use abstract tasks -> more control, for observation of how performance depends on

Skills-Based vs Knowledge-Based
  • Tasks that are skill-based lend themselves to repetition, and to be performed with different test conditions

    • Examples - reaction time, selection from menus, text entry

  • Knowledge-based tasks are more problematic as users gain knowledge when they perform the task, and need careful variation

    • Looking up information on a web site (vary what needs to be looked up)

    • Finding a train connection (vary the task)

    • Extract information from visualisations (vary data visualized)

Order Effects

  • In a within-subject design, there can be order effects on the results

  • Learning effects

    • Participants may perform better on a second condition because they benefitted from practice on the first (more practiced with the task)

  • Fatigue effects

    • Participants might get tired if the task is demanding, or they might get bored and less attentive if the task is repetitive

Counterbalancing

  • Used to compensate for any order effect or sequence effect

  • Divide participants into groups that each are given test conditions in a different order

  • If we have two conditions A and B:

    • Half of the users first use A, then B

    • The other half first use B, then A

  • If we have more conditions, use Latin squares instead of all permutations

Procedure

  • Encompasses everything that the participant does or is exposed to, from the moment they arrive for the study until they leave

  • The tasks that participants perform and the specific instructions, demonstration or practice they are given for their task

  • The order in which test conditions are administered, and how many repetitions/trials of the task in each condition

  • Includes consent procedure, and questionnaires that participants are given at the start (demographics) and post-task and/or post-test

  • Time for breaks between tasks/conditions, and total time for a session

Sampling

  • The selection of participants, as a sample from a target population

  • Ideally, the results of a study should hold for people in a target population who were not tested

  • Sampling is a major concern for survey research (e.g. opinion polls)

  • Experiments can produce statistically valid conclusions with relatively small samples

    • Just enough users to have confidence that we would get the same result with any other sample from the target population

Reporting of Participants

  • Studies collect and report demographic data that describes the study population

  • Age, gender, and any other characteristics relevant to the study

  • This helps understanding how representative the sample is

Analysis

  • When we evaluate designs, we are interested in the differences between the designs, and the effect of those differences on performance and UX

  • The people participating in a study will also be individually different in their performance and experience, but that is not we are interested in

    • For data analysis, we consider people a random factor

  • The data collected from participants is analysed statistically

    • Averaging performance across participants

    • Extrapolating from our sample to other users (people who are our target users but who were not tested)

Descriptive Statistics

  • Used for describing one data set (one group or condition)

  • Essential for any interval and ratio-level data collected from participants

    • How did participants rate ease-of-use, on average?

    • How much did time on task vary from participant to participant?

    • In what range do we expect the task success rate for our entire target group?

  • Report mean and standard deviation to show the distribution of the data points collected from across participants

    • How big an influence are individual differences on our data?

  • Report the confidence interval to show the range in which we expect the mean value for all potential users

Confidence Intervals (CI)

  • Represent the range in which we expect the average value for all possible users

  • EXCEL function:

    • CONFIDENCE(alpha, standard deviation, sample size)

  • alpha= 0.05 if we want 95% confidence

  • More variance (std dev) -> wider CI

  • More users -> narrower CI (4x as many users -> double the confidence)