Computational Linguistics and Probability Theory Notes
Computational Linguistics
- Definition: A field that combines linguistics and computer science, focusing on statistical description of natural language.
- Key Concepts: Involves probability and information theory as foundational elements to evaluate language processing.
Random Experiments and Events
- Random Experiment: An experiment whose outcomes cannot be precisely determined beforehand; can be repeated multiple times.
- Random Phenomenon: Statement about the outcome of a random experiment, determined post-experiment.
- Examples of Outcomes: Even number, number greater than 3.
- Elementary Phenomenon: Base outcomes that cannot be further decomposed.
- Sample Space: All elementary phenomena $ ext{Ω}$.
Probability Basics
- Probability of Event A: Given by P(A) = rac{k}{N} where:
- N: Total number of trials.
- k: Number of occurrences of event A.
- Characteristics:
- Probability values range: 0ext≤P(A)ext≤1.
- Certain events have probability = 1 P(Ω)=1.
- Sum of Probabilities for Incompatible Events: P(A B) = P(A) + P(B).
- Determining Probabilities: Can utilize combinatorial methods or statistical measurements from real data.
Joint and Conditional Probability
- Joint Probability: Denoted as P(A∩B)=P(A,B).
- Conditional Probability: Denoted as P(A | B) = rac{P(A ∩ B)}{P(B)} which estimates from relative frequencies.
- Rearranged as: P(A|B) = rac{c(A,B)/N}{c(B)/N} = rac{c(A,B)}{c(B)}.
Word Probability Examples
- Examining Word Probabilities:
- For the word "today" in Czech, labelled as event A.
- Event B as occurrence of the word "respektive".
- Calculating Probabilities:
- P(A) = rac{c(A)}{N_w}=rac{2}{4}=rac{1}{2}.
- P(B) = rac{c(B)}{N_w}=rac{1}{4}.
- Pair Probabilities:
- P(A,B) = rac{c(A,B)}{N_{w-1,w}} = rac{1}{4}.
- Conditional Probability: P(B|A) = rac{1/4}{1/2} = rac{1}{2}.
Random Variables
- Definition: Variable whose value is determined by the outcome of a random experiment, with discrete or continuous values.
- Discrete Random Variable: Countable outcomes.
- Continuous Random Variable: Unbounded possible values.
- Probability Distribution: Function describing the probabilities of individual outcomes. The distribution function F(x)=P(X≤x).
- Concept of Entropy: Measures uncertainty in the outcome of a random variable.
- More information leads to lower entropy.
- Formula: For a coin toss, H(X) = - ig( P( ext{head}) imes ext{log}2(P( ext{head})) + P( ext{tail}) imes ext{log}2(P( ext{tail})) ig).
- Example with a 32-sided die:
- Calculation shows entropy is H(X)=5 bits (larger uncertainty).
- Entropy Properties:
- Entropy (H) is non-negative, 0≤H(X).
- Uniform distribution has maximum entropy
Perplexity
- Definition: Measure of how difficult it is to predict outcomes of random variables.
- Examples of Perplexity:
- Variable with 2 outcomes: H(X)=1extbit,G(X)=2.
- Variable with 32 outcomes of varying probabilities has different perceptual difficulties in prediction.
Joint and Conditional Entropy
- Joint Entropy: Measures total uncertainty of two random variables $X$ and $Y$.
- Conditional Entropy: Measures remaining uncertainty of one variable given knowledge of the other, calculated through H(Y∣X).
General Properties of Language Entropy
- English Example: Different entropies depending on character prediction (
o character knowledge, previous, pairs, etc.). - Entropy Decrease: Observing previous events lowers entropy.
Kullback-Leibler Divergence** (Relative Entropy)
- Measures the difference between two probability distributions $ ext{D}(p ext{||} q)$.
- Not symmetric: D(pext∣∣q)<br/>eqD(qext∣∣p).
Cross Entropy
- Indicates bits required to describe one probability distribution using another. C(p,q)=−ext∑xp(x)extlog(q(x)).
- Quantifies the reduction in uncertainty of one random variable given knowledge of another. I(X,Y)=H(X)−H(X∣Y). If independent: I(X,Y)=0.