HCI Scientific Experiments Notes

Lecture Goals

Explain the fundamental reasons for conducting experiments in HCI.
Define a hypothesis and its critical role in experimental design.
Describe the process of operationalizing a hypothesis to make it testable.
Understand and differentiate various research methods used in HCI.
Define independent and dependent variables, and explain how they are used to establish cause-and-effect relationships.
Explain confounding variables and how to control them in experiments.
Differentiate between-group and within-group designs, outlining their advantages and disadvantages.

Readings

Chapter 4 ("Scientific Foundations"): Sections 4.1 to 4.8, focusing on the principles of scientific research relevant to HCI.
Chapter 5 ("Designing HCI Experiments"): Sections 5.1 to 5.14, covering the specifics of designing and conducting HCI experiments.

HCI Design Process and Research

The HCI design process involves:
- Identifying needs through user research and analysis.
- Ideation to generate potential design solutions.
- Prototyping to create tangible versions of the design.
- Evaluation to assess the design's usability and user experience.
- Implementation to finalize and deploy the design.
Research is crucial in this process to inform design decisions and validate outcomes.

What is Research?

Research is the systematic investigation or experimentation aimed at the discovery and interpretation of facts.
It also includes the revision of accepted theories or laws in light of new facts, ensuring continuous improvement and innovation.

Conceptual vs. Empirical Research

Conceptual Research

Focuses on theoretical exploration and development of concepts and ideas, providing a foundation for practical applications.
Deals with abstract and hypothetical concepts, often involving literature reviews, theoretical models, and philosophical arguments.
Aims to generate new knowledge or refine existing theories, contributing to the understanding of underlying principles.
Does not involve direct observation or collection of empirical data, relying instead on logical reasoning and theoretical frameworks.
Provides a theoretical foundation for empirical research, guiding the formulation of hypotheses and research questions.

Empirical Research

Focuses on gathering empirical evidence and testing hypotheses through direct observation and measurement.
Deals with real-world observations and data, using experiments, surveys, and other data collection methods.
Aims to test hypotheses, validate theories, or solve practical problems by collecting and analyzing empirical data.
Involves direct observation, data collection, and analysis of empirical data, using statistical and qualitative techniques.
Builds upon conceptual frameworks by testing and validating them, providing evidence-based insights and conclusions.

Why Conduct Research?

Scientists aim to understand cause and effect relationships to explain phenomena and predict outcomes.
Example: When metal is heated, it expands due to increased molecular motion.
To make predictions based on established cause-and-effect relationships.
Example: The metal in a bridge needs space to expand in hot weather to prevent structural damage.

Why Conduct HCI Research?

To test usability and user experience of interactive systems and interfaces.
Examples:
- Keyboard 1 allows faster input than keyboard 2, improving user efficiency.
- Users make fewer errors using interface X than Y, enhancing user accuracy.
- 70% of older adults can use mobile app XX without instructions, demonstrating ease of use.
Evaluation in HCI has three primary goals:
1. Identify specific problems or errors in the design that hinder usability.
2. Assess the usability and accessibility of the design, ensuring it meets user needs and preferences.
3. Assess users' experience with the design, including satisfaction, enjoyment, and perceived value.

Usability vs. User Experience (UX)

Usability: Focuses on the effectiveness, efficiency, learnability, memorability, and error prevention of a design.
User Experience (UX): Encompasses the overall satisfaction, enjoyment, pleasure, fun, and perceived value a user derives from interacting with a system.

Research Requirements

Reproducible (rigorous method): The research method must be clearly documented and replicable by other researchers.
Peer-reviewed and published (detailed report): The research findings must be reviewed by experts and published in reputable journals or conferences.
Builds on previous scientific evidence (references & citations): The research must be grounded in existing literature and properly cite relevant sources.

Research Methods

Observational methods to study user behavior in natural settings.
Experimental methods to manipulate variables and establish cause-and-effect relationships.
Correlational methods to examine relationships between variables without inferring causation.

Controlled Experiments

Controlled experiments are more reliable because they can isolate cause and effect through manipulation and control of variables.
Knowing cause and effect allows informed design decisions based on empirical evidence.
Mere observation is unreliable due to potential biases and confounding factors that can distort the results.
Participants may rate a system easy to use for various reasons, not necessarily its usability, such as prior experience or personal preferences.

Comparative Evaluation

Including a baseline condition serves as a check on the methodology and facilitates the comparison of results between user studies.
Example: Comparing Method B to Method A, and including a baseline condition C (FIGURE 4.10) to provide a reference point.

Cause & Effect: Examples

Ice Cream Sales vs. Shark Attacks: A classic example of correlation without causation.
Age of Miss America vs. Murders by steam, hot vapors, and hot objects: Another spurious correlation.
Number of people who drowned by falling into a pool vs. Films Nicolas Cage appeared in: A humorous illustration of unrelated variables.

The Common Theme

These are examples that highlight the difference between correlation and causation, emphasizing the importance of critical thinking.

Correlation vs. Causation

Just because two variables are correlated does not mean that one causes the other; correlation does not imply causation.
Example: Storks and Birth Rates
- A study showed a statistically significant correlation between stork populations and human birth rates across Europe (p = 0.008), but this does not mean storks bring babies.
- Possible explanations:
  1. Children cause storks (e.g., crying babies attract storks), which is unlikely.
  2. Storks cause children (mythical explanation), which is not scientifically valid.
  3. A third unknown aspect causes both (e.g., a village environment is more friendly to storks and families that desire children), indicating a confounding variable.

Experiment in a Nutshell

Experiment: Does exam room temperature affect students' test scores?
- Independent Variable (IV): Temperature (Manipulated; x-axis), the factor being changed.
- Dependent Variable (DV): Test scores (Measured; y-axis), the outcome being measured.

HCI Experiments: Independent and Dependent Variables

Example 1:
- Independent Variable (IV): Type of keyboard, such as QWERTY vs. Dvorak.
- Dependent Variable (DV): Typing speed, measured in words per minute.
Example 2:
- Independent Variable (IV): Robot's facial expression, such as happy, sad, or neutral.
- Dependent Variable (DV): Learning gain, measured by test scores or knowledge retention.
Example 3:
- Independent Variable (IV): Technology (e.g., VR vs. AR), User (e.g., novice vs. expert), Context (e.g., office vs. home).
- Dependent Variable (DV): Usability (e.g., task completion time, error rate), UX (e.g., satisfaction, enjoyment).
In HCI experiments, it is about establishing circumstantial causal relationships within specific contexts and user groups.

Research Question (RQ) and Hypothesis

Hypothesis: A prediction about the relationship between two or more variables, based on existing knowledge and theory.
Research Question (RQ): A specific question regarding the assumed relationship that the study tries to answer, guiding the research process.
- Independent Variable influences Dependent Variable (Cause -> Effect), forming the basis of the investigation.
In HCI research, RQs are often about new or existing user interfaces or interaction techniques, aiming to improve usability and UX.
Variables can be any attribute or property of humans, systems, or environments that can be measured or manipulated.
Example: Using keyboard X improves typing speed, meaning keyboard X is expected to increase the rate at which users can type.

Hypothesis Operationalization

Hypotheses MUST be testable and measurable to allow for empirical evaluation.
Examples:
- Keyboard X is better than keyboard Y in terms of typing speed and accuracy.
- Adding element XX improves the usability of the interface, resulting in higher user satisfaction.
- Adding auditory feedback in cars improves situational awareness in drivers, reducing reaction time to hazards.
Operationalization involves defining how these variables will be measured to ensure accurate and reliable results.
Increases typing speed → Reduces errors / increases user performance, quantifying the impact of the independent variable.
Faster response to events in the driving simulator (e.g., braking car/pedestrian crosses road), providing a measurable outcome.

Formulating RQs/Hypotheses

Literature review: Know the state of the art; Identify the research gap to avoid redundancy and build on existing knowledge.
Think small: Research must be practical and feasible; instead of focusing on a big idea, find a small idea that can be tested rigorously.
Think inside the box! Get inspired by existing designs; stick to established research methods to ensure validity and reliability.

Internal vs. External Validity

Internal Validity:
- Accuracy of the Answer, ensuring the results are due to the manipulated variables.
- Degree of confidence in the causal relationship, ruling out alternative explanations.
- Not influenced by other factors or variables, controlling for confounding variables.
External Validity:
- Breadth of Question, addressing the generalizability of the findings.
- Generalizability of findings to other people, situations, and contexts.
- Requires large samples and diverse populations to ensure the results are widely applicable.
Examples:
- High Internal Validity, Low External Validity: “Is the measured entry speed (in words per minute) higher with the new technique than with QSK after one hour of use?” focuses on a specific scenario.
- Low Internal Validity, High External Validity: “Is the new technique better than QSK?” asks a broad question without specifying conditions.

Correct Measurements

It is crucial to have correct measurements in research to ensure accuracy and reliability of the findings.
Examples: Measuring the length of a screw accurately or assessing intelligence using validated tests.

Scales of Measurement

Nominal: Mutually exclusive categories without any inherent order, such as car brands.
Ordinal: Categories with a meaningful order or ranking, such as ranking cars by comfort.
Interval: Equal distance between values but no absolute zero, such as temperature in Celsius or Fahrenheit.
Ratio: Equal distance between values and an absolute zero, such as time or distance.

Car Brands and Measuring “Comfort” - Examples

Nominal: Number of users of a specific car brand?
- Outcome: 100 Toyota cars vs 50 Ford. Problem: avg comfort = 100 toyota + 50 Ford / 2 ?? How to calculate the average comfort with nominal data?
Ordinal: Ranking best car to worst car with respect to Comfort & Aesthetics (C&A)?
- Outcome: Avg rank of Toyota > avg rank of Ford. Problem: How much better? avg comfort = ? Ordinal data does not provide precise measurements of difference.
Interval: “Seats are amazing!”: Disagree (-1), Neutral (0), or Agree (1)?
- Outcome: Avg value for Toyota: 0.8, avg value for Ford: 0.2. Problem: How much better is Toyota than Ford? 0.8-0.2=0.5? → None sense because interval scales lack a true zero point.
Ratio: How many seconds to turn on the car?
- Outcome: Toyota, 1.2 sec on avg; Ford: 1.8 sec on avg. No problem! Avg time to turn on 1.5 sec. Toyota is ‘better’ than Ford by exactly ‘0.6 sec’ because ratio scales allow for meaningful comparisons and calculations.

Mathematical Properties of Scales of Measurement

Nominal: No meaningful calculation can be performed.
Ordinal: Comparison (less/greater-than) is possible, but no mean can be calculated.
Interval: Mean can be calculated, but not ratios (e.g., “20°C is not twice as warm as 10°C”).
Ratio: Universal: arithmetic operators (+, -, *, /), mean, std, var,… Grounded in physical values (time, distance, velocity, number of actions,…).

Scale Sophistication

Nominal (crude) < Ordinal < Interval < Ratio (sophisticated) - FIGURE 4.4, illustrating the increasing level of detail and mathematical properties.

Likert Scale

Example: Please indicate your level of agreement with the following statements:
- It is safe to talk on a mobile phone while driving.
- It is safe to read a text message on a mobile phone while driving.
- It is safe to compose a text message on a mobile phone while driving.
Responses: Strongly disagree, Mildly disagree, Neutral, Mildly agree, Strongly agree (FIGURE 4.7), typically coded numerically.
Likert Scale data is typically treated as interval scale data for analysis purposes, although this is a subject of debate.

Choosing Measurements

Literature review to identify established and validated measures.
Consensus in a scientific community to ensure comparability and acceptance.
Validated scales and questionnaires to ensure reliability and validity of the results.
Results are comparable when using standardized measures.
If a validated scale does not exist, define it for your experiment, but be aware of the limitations.
- “For this experiment, we define driving experience as years of having a driver’s license.”
- Less valid but quantified, making it harder to compare to other results.

Evaluation in HCI Research

Consensus in HCI community on key metrics for evaluating interactive systems:
- Efficiency – Task completion time, measuring how quickly users can complete tasks.
- Accuracy – error rate, measuring the number of errors users make while performing tasks.
- Learnability – repeated measure of reaction time or performance, assessing how quickly users can learn to use the system.
Standardized questionnaires for assessing various aspects of user experience:
- NASA-TLX (task load) to measure perceived mental and physical workload.
- AttrakDiff (attractiveness of UI) to assess the aesthetic appeal of the user interface.
- SUS (Usability) to measure the overall usability of the system.
- UTAUT (acceptance and use of UI) to assess the factors influencing user acceptance and adoption of the interface.

Academic Integrity

Cases like Prof. Francesca Gino highlight the importance of academic integrity in research, emphasizing the need for honesty and transparency.