Principles of Mixed-Initiative User Interfaces

The Debate: Direct Manipulation versus Interface Agents

Historical Tension: UI research has traditionally diverged into two camps: * Direct Manipulation: Focusing on metaphors and tools that enhance a user's ability to directly manipulate objects to access information and invoke services. * Interface Agents: Focusing on machinery for sensing user activity and performing automated tasks.
The Mixed-Initiative Goal: Rather than focusing on one approach, mixed-initiative design seeks synergies between the two. The objective is to avoid using complex reasoning to fix poor design, while also not limiting interfaces to manual manipulation when automation could provide significant efficiency.
Definition of Mixed-Initiative: A collaborative approach where intelligent services and users work together efficiently to achieve the user's goals.

12 Principles for Mixed-Initiative User Interfaces

Developing Significant Value-Added Automation: Automated services must provide genuine value above what is attainable through direct manipulation alone.
Considering Uncertainty about a User’s Goals: Systems should employ machinery to infer and exploit uncertainty regarding user intentions and focus, as computers are rarely certain of user goals.
Considering the Status of a User’s Attention in the Timing of Services: Agents should model user attention and evaluate the costs and benefits of deferring action to less distracting times.
Inferring Ideal Action in Light of Costs, Benefits, and Uncertainties: Automated actions should be guided by the expected value of taking action, considering the context-dependent costs of assisting versus disturbing the user.
Employing Dialog to Resolve Key Uncertainties: When uncertain, the system should engage in efficient dialog, weighing the cost of "bothering" the user versus the benefit of clarification.
Allowing Efficient Direct Invocation and Termination: Because systems will make errors under uncertainty, users must be able to easily trigger or shut down automated services manually.
Minimizing the Cost of Poor Guesses about Action and Timing: UI designs should include features like automatic timing out and natural gestures for rejecting service to minimize the friction of false positives.
Scoping Precision of Service to Match Uncertainty: Agents should gracefully degrade the precision of their service. "Doing less" correctly (e.g., showing a week view instead of a specific hour) is better than forcing a user to undo a specific but incorrect action.
Providing Mechanisms for Efficient Agent-User Collaboration to Refine Results: Designs should assume users will want to complete or refine an agent's initial analysis.
Employing Socially Appropriate Behaviors for Agent-User Interaction: Agents should possess tasteful default behaviors and courtesies that align with the social expectations of a "benevolent assistant."
Maintaining Working Memory of Recent Interactions: Systems should remember recent interactions to allow users to make natural references to shared short-term experiences/objects.
Continuing to Learn by Observing: Automated services should improve over time by continuously learning from the user’s specific goals and needs.

Case Study: The Lookout System

Context: Lookout is a research testbed that overlays automated scheduling services onto Microsoft Outlook.
Functionality: It identifies new email messages brought into focus and assists the user in reviewing their calendar or composing appointments.
The Parsing Process: * Lookout parses the body and subject of an email to identify dates and times. * Anchor Dates: It uses the date the message was sent to normalize relative dates (e.g., if a message from yesterday says "tomorrow," Lookout understands that means "today"). * Temporal Implications: The system understands vague temporal phrases like "sometime tomorrow," "later in the week," "in May," or prototypical times like "morning," "afternoon," or "evening." * Recurrent Events: It recognizes social temporal markers like "at breakfast," "grab lunch," or "meet for dinner."
Graceful Degradation: If a specific time/date cannot be identified, the system degrades its goal to identifying a relevant span of time (day, week, or month) and displays that scoped view of the calendar to the user.

Decision Making and Interaction Modalities

Probabilistic Classification: Lookout processes email headers, subjects, and bodies to assign a probability that a user wants to schedule an appointment. This is done via a probabilistic classifier trained by watching the user's email habits.
Action Options: Based on the inferred probability and cost-benefit assessments, the system chooses one of three paths: 1. Do nothing: Wait for manual manipulation. 2. Engage in dialog: Ask the user if they want help. 3. Perform service: Automatically invoke the scheduling analysis.
Interaction Modalities: * Manual Modality: Action only occurs if the user clicks the Lookout icon in the system tray. * Alerting Mode: A red check mark appears on the icon if the system would have acted, but is in manual mode. Hovering provides a summary of the intended action. * Automated-Assistance Mode: Launches and populates Outlook appointment windows directly. * Social-Agent Modality: Uses MS Agent (animated characters) to query the user. This supports hands-free operation via text-to-speech (TTS) and automated speech recognition (ASR). * Natural Language Recognition: Accepts acknowledgments like "yes," "yeah," "sure," or "do it," and rejections like "no," "not now," "nah," or "go away."

Mathematical Framework for Expected Utility

Inference: The system calculates the probability $p(G|E)$ where $G$ is the goal and $E$ is the evidence (text in the email).
Outcome Utilities: Decisions are based on four deterministic outcomes mapped on a scale from $0.0$ to $1.0$ : * $u(A, G)$ : System takes action; user has the goal (Benefit). * u(A, G): System takes action; user does not have the goal (Cost of false positive). * u( A, G): System does not act; user has the goal (Cost of false negative). * u( A, G): System does not act; user does not have the goal (Correct inaction).
Expected Utility Equations: * Expected utility of acting: EU(A|E) = p(G|E)u(A, G) + [1 - p(G|E)]u(A, G) * Expected utility of not acting: EU( A|E) = p(G|E)u( A, G) + [1 - p(G|E)]u( A, G)
Threshold Probability ( $p^<em>$ ): The system acts if p(G|E) > p^. This threshold is the point where EU(A|E) = EU( A|E).
Dialog Thresholds: By adding dialog ( $D$ ) as an option with its own utilities ( $u(D, G)$ and u(D, G)), the system calculates two thresholds: * $P^<em><em>{I, D}$ : Threshold between Inaction and Dialog. $P^*</em>{D, A}$ : Threshold between Dialog and Action.

User Attention and Life-Long Learning

Attention Modeling: Lookout uses a sigmoid function to model the relationship between message length (in bytes) and the preferred "dwell time" before the system offers help. Users usually need more time to read longer messages before being interrupted.
Learning Mechanisms: * Text Classification: Uses a Linear Support Vector Machine (SVM) approximation (Platt's method). The current version was trained on $1000$ messages ( $500$ relevant, $500$ irrelevant). * Automated Refinement: The system stores messages as calendar-relevant if a user invokes a scheduling facility within a specific time horizon after reading them. * Custom Timing: Users can set the system to rebuild its timing model using regression analysis based on their specific behavior, or choose a fixed delay. * Behavioral Courtesies: If a user does not respond to a dialog, the system uses a timeout period (proportional to the inferred probability), makes a respectful, apologetic gesture, and disappears.