Lecture Notes on Power Laws, Normal Distribution, Association Rules, and Boosting
Power Law vs. Normal Distribution
Power Law
- Reflects a rapid drop-off in likelihood for values far from the mean.
- Dominated by outliers.
- Relationship of the form: P(x) \propto x^{-a}
- a is the exponent (or "power") that controls how quickly the probability drops.
- Common in virtual world contexts like money, sales, views, traffic, wealth data, YouTube views, word frequency, and clicks.
- Associated with "Extremistan," where the impact of power is considered significant.
- Characterized by greater asymmetry, especially with more sales, views, or wealth.
- Distribution of extreme values (very large or very small) is more common.
- Heavy-tailed distributions.
- Plotting the logarithm of frequency versus value produces a straight line.
- Example:
- Assume wealth follows a power law:
- Having more than €8 million: ~1 in 4,000 people
- Having more than €16 million: ~1 in 16,000 people
- Having more than €32 million: ~1 in 64,000 people
- Having more than €320 million: ~1 in 6.4 million people
- Incomes Example: If two people jointly make $1M/year, the most likely split is $500K each — variation is minimal and symmetric.
80/20 Rule
- 80% of the effects come from 20% of the causes.
- 20% of people hold 80% of the wealth.
- 20% of products generate 80% of sales.
- 20% of videos get 80% of views.
Normal Distribution
- Common in the physical world, such as weight, height, cholesterol, and blood counts.
- Associated with "Mediocristan," emphasizing the middle of the curve.
- Observations are closely linked to the mean, and the odds of deviation decline faster as you move away from the average.
- Probabilities drop at a faster rate as you move away from the mean.
- Thin-tailed distributions.
- Incomes Example: If two people jointly make $1M/year, the most likely split is $500K each — variation is minimal and symmetric.
- Well-defined mean and standard deviation regardless of the range
- The mean is the center of the bell curve, and the standard deviation controls the width.
- Example:
- Average adult height is 1.67 meters.
- Assume ten centimeters taller = 1.77 meters which would be 1 in 6.3 people
- Assume twenty centimeters taller = 1.87 meters which would be 1 in 44 people
- Assume thirty centimeters taller = 1.97 meters which would be 1 in 740 people
Association Rules
- Used to discover relationships among variables in large databases.
- Example: Customers who buy bread often also buy butter.
- Itemset: A collection of one or more items.
- Transaction: A set of items bought together (shopping basket).
- Frequent itemset: An itemset that appears in a dataset more frequently than a predefined threshold.
Key Metrics
- Support
- The proportion of transactions in the dataset that contain a particular itemset.
- Fraction of transactions that contain both A and B.
- Formula: Support(A \rightarrow B) = P(B \cap A)
- Example: If 20 out of 100 transactions include both milk and bread, then the support is 0.20.
- Confidence
- How often items in B appear in transactions that contain A.
- The proportion of transactions containing A that also contain B.
- Formula: Confidence(A \rightarrow B) = P(B | A) = \frac{Support(A \cap B)}{Support(A)}
- Measures the strength of the implication.
- Lift
- A measure of how much more likely B is to occur when A has occurred, compared to B occurring independently.
- Formula: Lift(A \rightarrow B) = \frac{Confidence(A \rightarrow B)}{Support(B)}
- Lift = \frac{P(B|A)}{P(B)} = \frac{Confidence}{P(B)}
- Lift > 1: Positive correlation (A and B occur more together than expected).
- Lift = 1: A and B are independent.
- Lift < 1: Negative correlation.
Apriori Algorithm
- Used to identify all frequent itemsets in a dataset.
- A subset of a frequent itemset must also be frequent.
- Support threshold: Used to eliminate infrequent itemsets early.
- New itemsets are generated by joining frequent itemsets of size k to create sets of size k + 1.
- Can be computationally expensive with large datasets.
- Pruning: Avoiding checking larger combinations if any of their subsets were already found to be infrequent.
- Example: If milk and eggs are not frequent, then milk, bread, and eggs are also not frequent.
- Uses a bottom-up approach, extending frequent itemsets one item at a time.
- Requires a minimum support threshold.
- Lowering the minimum support increases the number of rules generated.
- A rule is strong only if it has both high support and high confidence.
- The confidence of a rule can never be greater than the support.
Bayesian Terms
- Likelihood Ratio is equivalent to Lift.
- Compares P(E | H) to P(E | \neg H)
- Lift compares with overall P(E).
- If the likelihood ratio is greater than one, the evidence favors H.
- If the likelihood ratio is less than one, the evidence favors the opposite of H.
- If the likelihood ratio is equal to one, then evidence is neutral.
- Bayesian rules = update beliefs with new evidence.
- Association rules = find co-occurrences.
- Posterior Odds = Prior Odds x Likelihood Ratio
- Posterior Probability = Odds/ (1 + Odds)
- Prior Odds = The ratio of the probability of a hypothesis being true to it being false, before seeing evidence
- P(Evidence | Hypothesis) / P(Evidence | Not Hypothesis)
Boosting
- A machine learning technique that makes predictions more accurate by combining many simple models into one powerful model.
- Starts with one weak model.
- Observes the mistakes the model makes.
- Trains the next model to fix those mistakes.
- Repeats to create a final strong prediction.
- Does not rely on random sampling; it uses a full dataset.