1/59
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is the "curse of dimensionality" in the context of probability?
It refers to the fact that to represent a full joint distribution of d binary variables, we need 2^d - 1 terms. The number of parameters grows exponentially with the number of variables.
If you have 3 binary variables (X, Y, Z), how many terms are needed to represent their full joint distribution?
2^3 = 8 probabilities are listed (or 7 independent parameters).
What is a joint probability distribution?
A function that gives the probability of every possible combination of values for a set of random variables.
What is a random variable?
A variable whose possible values are numerical outcomes of a random phenomenon. (e.g., Rain = Yes or No).
What is a binary variable?
A random variable that can only take two possible values, such as 0/1, True/False, or Yes/No.
Write the Chain Rule of Probability for n variables.
P(X1, X2, …, Xn) = P(X1) * P(X2|X1) * P(X3|X1, X2) * … * P(Xn|X1, X2, …, Xn-1)
What does the symbol P(A|B) represent?
The conditional probability of event A occurring given that event B has already occurred.
In the chain rule, what does the final term P(Xn | X1, X2, …, Xn-1) represent?
The probability of the last variable Xn, conditioned on the specific values of all the variables that came before it.
In the jar example with 3 red and 1 blue ball, what does P(B1 = red) mean and what is its value?
It is the probability that the first ball drawn is red. Its value is 3/4.
In the jar example, what does P(B2 = red | B1 = red) mean and what is its value?
It is the probability that the second ball is red, given that the first ball drawn was red. Its value is 2/3.
In the jar example, what does the term "without replacement" mean?
It means that after the first ball is drawn, it is not put back into the jar before the second draw, which changes the probabilities for the second draw.
What are the two main components of a Bayesian Network?
In a Bayesian Network graph, what do the nodes and edges represent?
Nodes represent random variables. Edges (arrows) represent direct influence or a "causal" relationship from a parent node to a child node.
What does DAG stand for and what are its properties?
DAG stands for Directed Acyclic Graph. It is "directed" meaning edges have arrows, and "acyclic" meaning there are no cycles (you cannot follow arrows and return to your starting point).
What is the fundamental factorization rule for a Bayesian Network?
P(x1, x2, …, xn) = ∏ P(xi | parents(xi)) for i=1 to n. (The joint distribution is the product of each node's conditional probability given its parents).
In the Bayesian Network factorization, what does "parents(xi)" refer to?
The set of nodes in the graph that have a direct arrow pointing into node xi.
In the Naïve Bayes model, what is the key assumption encoded in the graph?
All input features (evidence variables X_i) are conditionally independent of each other given the class label Y.
Write the factorization formula for a Naïve Bayes model.
P(Y, X1, X2, …, Xn) = P(Y) * ∏ P(Xi | Y) for i=1 to n.
What is a Conditional Probability Table (CPT)?
A table that shows the probability distribution over a node's values for each possible combination of its parents' values.
In the Wet Grass example, which variables are the parents of "Wet Grass"?
Sprinkler and Rain.
In the Wet Grass example, how many independent parameters does the CPT for "Wet Grass" have and why?
How many parameters would a full joint distribution require for 4 binary variables?
2^4 - 1 = 15 independent parameters.
How many total parameters does the Wet Grass Bayesian Network require, and how many does it save?
It requires 9 parameters (C:1, S:2, R:2, W:4). It saves 15 - 9 = 6 parameters compared to the full joint distribution.
Define unconditional independence.
Two variables A and B are independent if P(A, B) = P(A) * P(B), or equivalently, P(A|B) = P(A). Knowing B tells you nothing about A.
Define conditional independence.
Two variables A and B are conditionally independent given a third variable C if P(A, B | C) = P(A | C) * P(B | C). This is written as (A ⟂ B) | C.
What is the notation for "A is independent of B given C"?
(A ⟂ B) | C
How many parameters are needed for the full joint P(A,B,C) if all are binary, using the chain rule P(C)P(A|C)P(B|A,C)?
1 + 2 + 4 = 7 parameters.
How many parameters are needed for the joint if A is independent of B given C, i.e., P(C)P(A|C)P(B|C)?
1 + 2 + 2 = 5 parameters.
State the Local Markov Property in your own words.
A node is conditionally independent of its non-descendants, given its parents.
In the Local Markov Property, what are considered "non-descendants" of a node X?
All nodes that are not parents, children, or descendants (i.e., not children, grandchildren, etc.) of X.
Explain the intuition behind the Local Markov Property.
"I only need to know my immediate parents to ignore my non-descendants." Once you know the parents' values, other unrelated nodes provide no extra information.
What is a Markov Blanket?
The set of nodes that completely shields a node X from the rest of the network. X is independent of all other nodes given its Markov blanket.
What three groups of nodes make up the Markov blanket of a node X?
Explain the intuition behind the Markov Blanket.
"I only need to know my Markov blanket (parents, children, and co-parents) to ignore everything else in the network."
What is D-separation?
A graphical criterion used to determine if two sets of nodes are conditionally independent given a third set of observed nodes.
What does it mean if two nodes A and C are d-separated by a set of observed nodes Z?
It means that A and C are conditionally independent given Z. (A ⟂ C | Z)
What are the three basic structures that determine if a path is blocked?
In a Chain (A → B → C), when is the path blocked?
The path is blocked if the middle node B is in the observed set Z.
In a Fork (A ← B → C), when is the path blocked?
The path is blocked if the middle node B is in the observed set Z.
In a Collider (A → B ← C), when is the path blocked?
The path is blocked if the middle node B (or any of its descendants) is NOT in the observed set Z. The path becomes unblocked if B is observed.
What is the "explaining away" phenomenon?
It occurs in a common effect structure. If the effect is observed, confirming one cause reduces the probability of the other cause, as it "explains away" the observation.
In the structure S (Sprinkler) → W (Wet Grass) ← R (Rain), what happens to the independence of S and R if W is observed?
If W is observed, S and R become dependent (explaining away).
Write the factorization for a simple direct influence A → B.
P(A, B) = P(A) * P(B|A)
Write the factorization for an indirect influence (chain) A → B → C.
P(A, B, C) = P(A) * P(B|A) * P(C|B)
Write the factorization for a common cause (fork) A ← B → C.
P(A, B, C) = P(B) * P(A|B) * P(C|B)
Write the factorization for a common effect (collider) A → B ← C.
P(A, B, C) = P(A) * P(C) * P(B|A, C)
In a common cause structure (A ← B → C), what is the independence relationship if B is given?
A and C are independent given B. (A ⟂ C | B)
In a chain structure (A → B → C), what is the independence relationship if B is given?
A and C are independent given B. (A ⟂ C | B)
In a common effect structure (A → B ← C), what is the independence relationship if B is NOT given?
A and C are independent. (A ⟂ C)
State one advantage of Bayesian Networks related to visualization.
They provide a graphical representation that offers a visual and intuitive way to understand complex relationships between variables.
State one advantage of Bayesian Networks related to knowledge types.
They can combine prior knowledge (from experts) with statistical information from data.
What is a key disadvantage of Bayesian Networks regarding knowledge?
They often require prior knowledge to specify many probabilities, which can be difficult to obtain.
What is a key disadvantage of Bayesian Networks regarding computation?
Performing exact inference can be computationally intractable (very difficult or impossible) for large, complex networks.
Write the formula for the Chain Rule of Probability.
P(X1, X2, …, Xn) = P(X1) * P(X2|X1) * P(X3|X1, X2) * … * P(Xn|X1, X2, …, Xn-1)
Write the general factorization formula for a Bayesian Network.
P(x1, x2, …, xn) = ∏ P(xi | parents(xi))
Write the factorization formula for a Naïve Bayes model.
P(Y, X1, X2, …, Xn) = P(Y) * ∏ P(Xi | Y)
Write the formula for unconditional independence.
P(A, B) = P(A) * P(B)
Write the formula for conditional independence.
P(A, B | C) = P(A | C) * P(B | C)
Write the formula for the Local Markov Property.
X ⟂ Non-descendants | Parents(X)
Write the formula for the Markov Blanket property.
X ⟂ All other nodes | MarkovBlanket(X)