Quantifying Uncertainty

Acting Rationally

Rational agents with perfect knowledge of the environment*

‣ can find an optimal solution by exploring the complete environment;

‣ can find a good (but perhaps suboptimal) solution by exploring part of the environment using heuristics.

*: rarely the entire environment

Acting Rationally under Uncertanty

What should rational agents do if they don’t have perfect information? (Poker)

‣ Maximize performance by keeping track of the relative importance of different outcomes and the likelihood that these outcomes will be achieved.

Dealing with Uncertanty

Logic is insufficient

‣ toothache ⇒ cavity ?

‣ toothache ⇒ cavity ∨ gum problem

‣ toothache ⇒ cavity ∨ gum problem ∨ abscess

‣ toothache ⇒ cavity ∨ gum problem ∨ abscess ∨ sinus block ∨ …

‣ cavity ⇒toothache ?

Only an exhaustive list of possibilities on the right hand side will make the rule true.

Logic is insufficient

‣ Laziness: it’s too much work to make and use the rules.

‣ Theoretical Ignorance: we don’t know everything there is to know.

‣ Practical Ignorance: we don’t have access to all of the information.

→ Replace certainty (logic) with degrees of belief (probability)

Probability Theory

‣ Probability statements are usually made with regard to a knowledge state.

‣ Actual state: patient has a cavity or patient does not have a cavity

‣ Knowledge state: probability that the patient has a cavity if we haven’t observed her yet.

Decision Theory

Decision theory = probability theory + utility theory

Principle of maximum expected utility (MEU)

‣ An agent is rational if and only if it chooses the action that yields the highest expected utility. Expected = average of outcome utilities, weighted by probability of the outcome

Acting under Uncertanity

What if: You have to choose between a 0.8 chance of getting 4000 EUR, and a 100 % chance of getting 3000 EUR ( and a weel, utility, probabilty, expected utility)

Probability Terminology Map

Possible Worlds

‣ The term possible worlds originates in philosophy the actual world could have been different. in reference to ways in which

‣ In statistics and AI, we use it to refer to the possible states of whatever we are trying to represent, for example, the possible configurations of a chess board or the possible outcomes of throwing a die.

‣ The term world is limited to the problem we aretrying to represent.

‣ A possible world (ω(lowercase omega)) is a state that the world could be in. capital omega

‣ The set of possible worlds ( Ω(capital omega) ) includes all the states that the world could be in. In other words, Ω must be exhaustive.

‣ Each possible world must be different from all the other possible worlds. In other words, possible worlds must be mutually exclusive.

Sample space

Ex. Throwing dice

Sample space = Set of all possible worlds: Ω

‣ Ω = {(1,1), (1,2), … (6,5), (6,6)}

Possible world = element of the sample space: ω

‣ w ‣ ω₁ = (1,1)

‣ ω₃₆ = (6,6)

0 ≤ P(ω) ≤ 1 for every ω and∑ P(ω) = 1

Random variables and Events

Random variable

‣ Function that maps from a set of possible worlds to a domain or range.

‣ Typically written with upper case letter

Event

‣ Set of worlds in which a proposition holds

‣ Probability of an event: sum of probabilities of the worlds in which a proposition holds Example

‣ Random variable: Total with range {2, …, 12}

‣ Proposition "rolling 11 with two dice": P(Total = 11)

‣ Event: set of worlds in which the proposition holds: {(5,6),(6,5)}

‣ Probability of event: P((5,6)) + P((6,5)) = 1/36 + 1/36 = 1/18

(Un)conditional Probabilities

Unconditional probabilities: Degree of belief in propositions in the absence of other information.

‣ Also known as prior probabilities or priors.

Conditional probabilities: Degree of belief given other information.

‣ Example: rolling a double if the first dice is 5

‣ P (doubles ∣ Dice1 = 5

Conditional Probabilities

Computing conditional probabilities

Joint probability distribution:

P(Toothache, Cavity) = P(Toothache ∣ Cavity)P(Cavity)

‣ Boldface P means “for all possible values of the random variable”.

‣ A probability model is completely determined by the full joint probability distribution (the joint distribution for all of the random variables).

‣ E.g., P(Cavity, Toothache, Catch) = 2x2x2 table

Probability Axioms

Probabilistic Inference

P(cavity ∨toothache)= 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28

Extracting unconditional probabilities(marginalization)

Computing conditional probabilities

Normalization constant

Normalization

A general inference procedure

General form of the procedure described in previous slide:

P(X ∣ e) = α P(X,e) = α∑ P(X,e,y)

→ Can answer questions about the probability distribution of a discrete random variable X, given evidence variables E, and unobserved variables Y.

Does not scale well. For n variables with two values each:

‣ space complexity = O(2n)

‣ time complexity = O(2n)

‣ (i.e., complexity doubles with every additional variable)

Adding extra variables

Full joint distribution with 32 elements…

Independence

‣ Assumptions about independence are usually based on domain knowledge.

‣ Independence drastically reduces the amount of information needed to specify the full joint distribution

For instance: rolling 5 dice

‣ Full joint distribution: 65 = 7776

‣ Five single variable distributions: 6 * 5 = 30

Conditional Independence

Example:

‣ Catch and toothache are not independent: if the probe catches, then it is likely that the tooth has a cavity and that this cavity causes a toothache.

‣ However, toothache and catch are independent, given the presence or absence of a cavity.

• If a cavity is present, then whether there is a toothache is not dependent on whether the probe catches, and vice versa.

• If a cavity is not present, then whether there is a toothache is not dependent on whether the probe catches, and vice versa.

→ P(toothache, catch | cavity) = P(toothache | cavity)P(catch | cavity)

P(X,Y|Z) = P(X|Z) P(Y|Z)

Bayes’ Rule- Derivation

Bayes’ Rule

Bayes’ Rule - Diagnostic application

Determining the probability of a cause given a certain effect (diagnosis).

‣ Example: what is the probability that you ate a magic mushroom if you are hallucinating?

A patient comes into the hospital with hallucinations after lunch.

‣ Magic mushrooms cause hallucinations 70% of the time.

‣ The prior probability that someone ate magic mushrooms for lunch is 1/50,000.

‣ The prior probability that someone who comes into the hospital has hallucinations is 1%.

Bayes’ Rule- Causal application

Determining the probability of an effect given a certain cause (medication).

‣ Example: what is the probability that you will hallucinate when eat a magic mushroom?

A devoted researcher does experiments with psychoactive substances once a month. Over the course of the last year, the researcher did 12 experiments

‣ the researcher hallucinated 9 times out of 12.

‣ out of the 10 times the researcher was hallucinating, two were attributable to magic mushroom use.

‣ half of the experiments involved magic mushrooms.

Scaling up inference?

Neither inferring from the full joint distribution, nor a straightforward application of Bayes’ rule scale up well

‣ Conditional independence assertions can decompose full joint distribution into smaller pieces.

‣ If we assume that a single cause influences a number of independent effects → Naive Bayes model

Summary

‣ Logic is insufficient to act rationally under uncertainty.

‣ Decision theory states that under uncertainty, the best action is the one that maximizes the expected utility of the outcomes.

‣ Probability theory formalizes the notions we require to infer the expected utility of actions under uncertainty.

‣ Given a full joint probability distribution, we can formalize a general inference procedure.

‣ Bayes’ rule allows for inferences about unknown probabilities from conditional probabilities. ‣ Neither the general inference procedure, nor Bayes’ rule scale up well.

‣ Assuming conditional independence allows for the full joint probability distribution to be factored into smaller conditional distributions → Naive Bayes.