1/49
Comprehensive practice flashcards covering probability foundations, probabilistic graphical models, HMM algorithms (Viterbi, Forward, Backward, Baum-Welch), and biological applications like GENSCAN and CpG Islands.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Sample Space
The set of all possible outcomes of an experiment, which can be discrete (e.g., A,C,G,T) or continuous (e.g., (0,extN)).
Elementary Events
Outcomes that are atomic, mutually exclusive, and collectively exhaustive.
Atomic Events
Events that cannot be broken down further into smaller components.
Mutually Exclusive
A relationship between events where only one can occur at a time.
Collectively Exhaustive
A set of events whose probabilities sum to exactly 1.
Probability Axiom 1 (Non-negativity)
The rule stating that there are no negative probabilities, represented as P(E)extextisextalwaysextatextleast0.
Probability Axiom 2 (Normalization)
The requirement that the sum of probabilities for all outcomes in a sample space must equal exactly 1.
Addition Rule (Axiom 3)
If two events are mutually exclusive, the probability of one or the other occurring is the sum of their individual probabilities (P(E1extorE2)=P(E1)+P(E2)).
Joint Probability
The probability that two different random variables take on specific values at the same time, denoted as P(X,Y).
Marginal Probability
The probability of one variable happening regardless of the result of another, calculated by "marginalizing out" or summing across the other variable: P(X=x)=extSumoverYextofP(X=x,Y=y).
Conditional Probability
The probability that event X happens given that event Y has already happened, denoted as P(X∣Y).
Independence
The property where two variables do not impact each other, mathematically defined when P(X,Y)=P(X)imesP(Y) or P(X∣Y)=P(X).
Bayes' Rule
A formula used for "The Great Inversion" to find inverse probability, expressed as P(S|X) = rac{P(X|S) imes P(S)}{P(X)}.
Prior Belief
In Bayes' Rule, the initial belief (P(S)) before seeing new data.
Normalizing Factor
The denominator in Bayes' Rule (P(X)) that ensures the total probability equals 1.
Probabilistic Graphical Models (PGMs)
A map of a joint probability distribution used to model large biological systems using assumptions.
Nodes
Represented as circles in a PGM, each node corresponds to a specific random variable.
Edges
Directed arrows in a graph indicating a conditional dependency (e.g., XightarrowY implies one must know X for Y).
Roots
Nodes in a PGM with no parents, representing prior independent events.
Parents
In a relationship XightarrowY, X is the parent node.
Children
In a relationship XightarrowY, Y is the child node.
Graphical Factoring Rule
The joint probability of a whole system is the product of every node given its parents: P(X1,ext…,Xn)=extProductofP(Xi∣extParents(Xi)).
Maximum Likelihood Estimation (MLE)
A process of parameter estimation using real biological data to find probabilities by counting and normalizing.
First Order Markov Property
The assumption that the "future" depends only on the "present" and not the past: P(Xi+1∣Xi,Xi−1,ext…,X0)=P(Xi+1∣Xi).
Markov Model (Chain)
A graphical model that looks like a line of arrows used to calculate the probability of a specific sequence (e.g., DNA).
CpG Islands
Genomic regions near gene promoters where the methylation of Cytosine to Thymine is blocked, making "CG" sequences more common.
Hidden Markov Models (HMMs)
A model used when the biological function (hidden states) is not visible, but symbols (emissions) are observed.
Doubly Stochastic
A property of HMMs representing two levels of randomness: State Transition and Symbol Emission.
Initial Distribution (oldsymbol{ ext{\pi}})
The probability of starting in each specific hidden state at the beginning of a sequence where i=1.
Transition Matrix (oldsymbol{ ext{A}})
A KimesK matrix where each value akl is the probability of moving from state k to state l, with each row summing to 1.
Emission Matrix (oldsymbol{ ext{E}})
A KimesB matrix where ek(b) is the probability that state k emits symbol b, with all symbols within a state summing to 1.
The Decoding Problem
The challenge of identifying the single most likely sequence of hidden states that produced a sequence of observations.
Viterbi Algorithm
A dynamic programming algorithm that solves the decoding problem in linear time using a Trellis Diagram.
Trellis Diagram
A graph where columns represent positions in the DNA sequence and rows represent hidden states, used by the Viterbi algorithm.
Viterbi Traceback
The process of following back-pointers from the final maximum score to reconstruct the path of hidden states.
Log-space Math
The practice of taking the logarithm of probabilities to prevent computational underflow when multiplying many tiny numbers.
Posterior Decoding
A local decoding method that considers all possible paths passing through a state at a specific position to find its probability.
Forward Algorithm
Calculates the total probability (fk(i)) of all paths that reach state k at position i by summing across previous states.
Backward Algorithm
Calculates probability (bk(i)) by working in reverse from the end of a sequence to the current position to predict the present based on the future.
Baum-Welch Algorithm
An expectation-maximization (EM) algorithm used to estimate HMM parameters when labeled data is not provided.
Expectation Step (E)
The step in Baum-Welch where the forward-backward algorithm is used to calculate the expected counts of transitions and emissions.
Maximization Step (M)
The step in Baum-Welch where factorial counts are used to update the Transition (A) and Emission (E) matrices.
GENSCAN
A generalized HMM (GHMM) developed by Burge and Karlin in 1997 to predict gene structure in human DNA.
Duration Modeling
A feature of GHMMs allowing state lengths to be modeled explicitly (e.g., average exon length of 180 bp) rather than decaying exponentially.
Local Maxima
A potential pitfall of the Baum-Welch algorithm where the model converges on a good solution that is not the absolute best (global) solution.
Ultrametricity
A property that assumes a molecular clock where all leaves in a phylogenetic tree are equidistant from the root.
Additivity
A property that does not assume mutations are proportionate to time, producing unrooted trees that require an outgroup.
Affine Gap
A 3-matrix dynamic programming approach where different penalties are applied for opening versus extending a gap.
Long-branch Attraction
An error where sequences with many mutations are incorrectly grouped together because they appear similar by chance.
NJ Algorithm
The Neighbor-Joining algorithm developed by Saitou and Nei used for building phylogenetic trees.