RM

Lecture Notes on Power Laws, Normal Distribution, Association Rules, and Boosting

Power Law vs. Normal Distribution

Power Law

  • Reflects a rapid drop-off in likelihood for values far from the mean.
  • Dominated by outliers.
  • Relationship of the form: P(x) \propto x^{-a}
  • a is the exponent (or "power") that controls how quickly the probability drops.
  • Common in virtual world contexts like money, sales, views, traffic, wealth data, YouTube views, word frequency, and clicks.
  • Associated with "Extremistan," where the impact of power is considered significant.
  • Characterized by greater asymmetry, especially with more sales, views, or wealth.
  • Distribution of extreme values (very large or very small) is more common.
  • Heavy-tailed distributions.
  • Plotting the logarithm of frequency versus value produces a straight line.
  • Example:
    • Assume wealth follows a power law:
      • Having more than €8 million: ~1 in 4,000 people
      • Having more than €16 million: ~1 in 16,000 people
      • Having more than €32 million: ~1 in 64,000 people
      • Having more than €320 million: ~1 in 6.4 million people
  • Incomes Example: If two people jointly make $1M/year, the most likely split is $500K each — variation is minimal and symmetric.

80/20 Rule

  • 80% of the effects come from 20% of the causes.
    • 20% of people hold 80% of the wealth.
    • 20% of products generate 80% of sales.
    • 20% of videos get 80% of views.

Normal Distribution

  • Common in the physical world, such as weight, height, cholesterol, and blood counts.
  • Associated with "Mediocristan," emphasizing the middle of the curve.
  • Observations are closely linked to the mean, and the odds of deviation decline faster as you move away from the average.
  • Probabilities drop at a faster rate as you move away from the mean.
  • Thin-tailed distributions.
  • Incomes Example: If two people jointly make $1M/year, the most likely split is $500K each — variation is minimal and symmetric.
  • Well-defined mean and standard deviation regardless of the range
  • The mean is the center of the bell curve, and the standard deviation controls the width.
  • Example:
    • Average adult height is 1.67 meters.
      • Assume ten centimeters taller = 1.77 meters which would be 1 in 6.3 people
      • Assume twenty centimeters taller = 1.87 meters which would be 1 in 44 people
      • Assume thirty centimeters taller = 1.97 meters which would be 1 in 740 people

Association Rules

  • Used to discover relationships among variables in large databases.
  • Example: Customers who buy bread often also buy butter.
  • Itemset: A collection of one or more items.
  • Transaction: A set of items bought together (shopping basket).
  • Frequent itemset: An itemset that appears in a dataset more frequently than a predefined threshold.

Key Metrics

  • Support
    • The proportion of transactions in the dataset that contain a particular itemset.
    • Fraction of transactions that contain both A and B.
    • Formula: Support(A \rightarrow B) = P(B \cap A)
    • Example: If 20 out of 100 transactions include both milk and bread, then the support is 0.20.
  • Confidence
    • How often items in B appear in transactions that contain A.
    • The proportion of transactions containing A that also contain B.
    • Formula: Confidence(A \rightarrow B) = P(B | A) = \frac{Support(A \cap B)}{Support(A)}
    • Measures the strength of the implication.
  • Lift
    • A measure of how much more likely B is to occur when A has occurred, compared to B occurring independently.
    • Formula: Lift(A \rightarrow B) = \frac{Confidence(A \rightarrow B)}{Support(B)}
    • Lift = \frac{P(B|A)}{P(B)} = \frac{Confidence}{P(B)}
    • Lift > 1: Positive correlation (A and B occur more together than expected).
    • Lift = 1: A and B are independent.
    • Lift < 1: Negative correlation.

Apriori Algorithm

  • Used to identify all frequent itemsets in a dataset.
  • A subset of a frequent itemset must also be frequent.
  • Support threshold: Used to eliminate infrequent itemsets early.
  • New itemsets are generated by joining frequent itemsets of size k to create sets of size k + 1.
  • Can be computationally expensive with large datasets.
  • Pruning: Avoiding checking larger combinations if any of their subsets were already found to be infrequent.
    • Example: If milk and eggs are not frequent, then milk, bread, and eggs are also not frequent.
  • Uses a bottom-up approach, extending frequent itemsets one item at a time.
  • Requires a minimum support threshold.
  • Lowering the minimum support increases the number of rules generated.
  • A rule is strong only if it has both high support and high confidence.
  • The confidence of a rule can never be greater than the support.

Bayesian Terms

  • Likelihood Ratio is equivalent to Lift.
    • Compares P(E | H) to P(E | \neg H)
    • Lift compares with overall P(E).
    • If the likelihood ratio is greater than one, the evidence favors H.
    • If the likelihood ratio is less than one, the evidence favors the opposite of H.
    • If the likelihood ratio is equal to one, then evidence is neutral.
  • Bayesian rules = update beliefs with new evidence.
  • Association rules = find co-occurrences.
  • Posterior Odds = Prior Odds x Likelihood Ratio
  • Posterior Probability = Odds/ (1 + Odds)
  • Prior Odds = The ratio of the probability of a hypothesis being true to it being false, before seeing evidence
  • P(Evidence | Hypothesis) / P(Evidence | Not Hypothesis)

Boosting

  • A machine learning technique that makes predictions more accurate by combining many simple models into one powerful model.
  • Starts with one weak model.
  • Observes the mistakes the model makes.
  • Trains the next model to fix those mistakes.
  • Repeats to create a final strong prediction.
  • Does not rely on random sampling; it uses a full dataset.