1/18
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Probability Axioms
The Probability Axioms are: 1) For any event A, 0 ≤ P(A) ≤ 1. 2) P(True) = 1, P(False) = 0. 3) For mutually exclusive events, P(A ∨ B) = P(A) + P(B). More generally, P(A ∨ B) = P(A) + P(B) - P(A ∧ B). 4) The sum of probabilities over all possible mutually exclusive outcomes in the sample space Ω is 1.
Conditional Probability Definition
Conditional Probability P(B|A) is the probability of event B occurring given that event A has occurred. It is defined as: P(B|A) = P(A ∧ B) / P(A), provided P(A) > 0. This leads to the chain rule: P(A ∧ B) = P(B|A) * P(A).
Law of Total Probability
The Law of Total Probability states that for an event B and a set of mutually exclusive and exhaustive events {Aᵢ} (a partition), P(B) = Σᵢ P(B | Aᵢ) P(Aᵢ). It "marginalizes" over the Aᵢ to compute the total probability of B.
Bayes' Theorem (Derivation)
Bayes' Theorem is derived from the chain rule: P(A ∧ B) = P(B|A)P(A) = P(A|B)P(B). Rearranging gives P(B|A) = [P(B) * P(A|B)] / P(A). Using the law of total probability for the denominator: P(A) = Σᵢ P(A|Bᵢ)P(Bᵢ).
Probabilistic Inference: Purpose
Probabilistic Inference uses probability theory to derive new conclusions (update beliefs) from observed data and prior knowledge, capturing uncertainty. It answers queries like P(Query | Evidence). Simple correlations can be used, but complex networks of dependencies (like Bayesian Networks) are needed for real-world problems.
Inference with Joint Probability Distribution
A brute-force inference method uses the full joint probability distribution (a truth table for all variables). To find P(Query | Evidence), one sums the probabilities of all worlds consistent with Query ∧ Evidence and divides by the sum for worlds consistent with Evidence. This is sound and complete but does not scale, as the table size is exponential in the number of variables.
Conditional Law of Total Probability
The Conditional Law of Total Probability computes P(A|B) by marginalizing over a partition {Cᵢ}: P(A|B) = Σᵢ P(A | B, Cᵢ) * P(Cᵢ | B). This is useful when the conditional probabilities given both B and Cᵢ are easier to obtain than P(A|B) directly. It requires Σᵢ P(Cᵢ|B) = 1.
Bayesian Network (Definition)
A Bayesian Network (Bayes Net) is a directed acyclic graph (DAG) where: 1) Nodes represent random variables. 2) Edges represent direct probabilistic dependencies. 3) Each node has a conditional probability table (CPT) quantifying its dependence on its parents. It compactly encodes the joint distribution via the chain rule: P(X₁,…,Xₙ) = Πᵢ P(Xᵢ | Parents(Xᵢ)).
Bayesian Network: Computing a Conditional Probability (Step 1)
To compute P(M|H) using a Bayes Net: Step 1: Apply Bayes' Theorem: P(M|H) = [P(M) * P(H|M)] / P(H). This breaks the problem into finding a prior P(M), a likelihood P(H|M), and evidence P(H).
Bayesian Network: Computing Likelihood P(H|M) (Step 2)
Step 2: Compute likelihood P(H|M) using the conditional law of total probability, marginalizing over another variable S (e.g., Stress): P(H|M) = P(H|M,S)P(S|M) + P(H|M,
S)P(
S|M). If M and S are independent (no direct edge), P(S|M) = P(S), so P(H|M) = P(H|M,S)P(S) + P(H|M,
S)P(
S).
Bayesian Network: Computing Evidence P(H) (Step 3)
Step 3: Compute evidence P(H) using the law of total probability, marginalizing over all combinations of parent variables (e.g., M and S): P(H) = Σ_{m,s} P(H|M=m, S=s) * P(M=m) * P(S=s). This sums over all ways H can occur, weighted by the probabilities of its parent states.
Bayesian Network: Final Substitution (Step 4)
Step 4: Substitute the computed P(H|M) and P(H) back into Bayes' Theorem: P(M|H) = [P(M) * P(H|M)] / P(H). This yields the updated (posterior) probability of M given the observed evidence H.
Bayesian Network Properties (DAG & CPTs)
Key properties: 1) The graph is a Directed Acyclic Graph (DAG) (no cycles). 2) Each node's Conditional Probability Table (CPT) specifies P(node | parents). 3) The network encodes conditional independence relationships, making it memory efficient compared to a full joint distribution (but inference can still be computationally hard).
Advantages of Bayesian Networks
Advantages include: 1) Compact representation of complex joint distributions. 2) Intuitive visualization of dependencies for human interpretation. 3) Efficient inference (relative to full joint tables) for many queries. 4) Integration of prior knowledge (priors) and evidence via Bayes' Theorem for updating beliefs.
Advantages of Bayes' Theorem
Advantages of Bayes' Theorem for reasoning: 1) Provides a mathematically sound framework for updating beliefs with evidence. 2) Incorporates prior knowledge (the prior P(Hypothesis)). 3) Quantifies uncertainty in conclusions. 4) The update is iterative: the posterior from one update can become the prior for the next as new evidence arrives.