Lecture Notes on Power Laws, Normal Distribution, Association Rules, and Boosting

Power Law vs. Normal Distribution

Reflects a rapid drop-off in likelihood for values far from the mean.
Dominated by outliers.
Relationship of the form: P(x) \propto x^{-a}
a is the exponent (or "power") that controls how quickly the probability drops.
Common in virtual world contexts like money, sales, views, traffic, wealth data, YouTube views, word frequency, and clicks.
Associated with "Extremistan," where the impact of power is considered significant.
Characterized by greater asymmetry, especially with more sales, views, or wealth.
Distribution of extreme values (very large or very small) is more common.
Heavy-tailed distributions.
Plotting the logarithm of frequency versus value produces a straight line.
Example:
- Assume wealth follows a power law:
  - Having more than €8 million: ~1 in 4,000 people
  - Having more than €16 million: ~1 in 16,000 people
  - Having more than €32 million: ~1 in 64,000 people
  - Having more than €320 million: ~1 in 6.4 million people
Incomes Example: If two people jointly make $1M/year, the most likely split is $500K each — variation is minimal and symmetric.

80% of the effects come from 20% of the causes.
- 20% of people hold 80% of the wealth.
- 20% of products generate 80% of sales.
- 20% of videos get 80% of views.

Common in the physical world, such as weight, height, cholesterol, and blood counts.
Associated with "Mediocristan," emphasizing the middle of the curve.
Observations are closely linked to the mean, and the odds of deviation decline faster as you move away from the average.
Probabilities drop at a faster rate as you move away from the mean.
Thin-tailed distributions.
Incomes Example: If two people jointly make $1M/year, the most likely split is $500K each — variation is minimal and symmetric.
Well-defined mean and standard deviation regardless of the range
The mean is the center of the bell curve, and the standard deviation controls the width.
Example:
- Average adult height is 1.67 meters.
  - Assume ten centimeters taller = 1.77 meters which would be 1 in 6.3 people
  - Assume twenty centimeters taller = 1.87 meters which would be 1 in 44 people
  - Assume thirty centimeters taller = 1.97 meters which would be 1 in 740 people

Used to discover relationships among variables in large databases.
Example: Customers who buy bread often also buy butter.
Itemset: A collection of one or more items.
Transaction: A set of items bought together (shopping basket).
Frequent itemset: An itemset that appears in a dataset more frequently than a predefined threshold.

Support
- The proportion of transactions in the dataset that contain a particular itemset.
- Fraction of transactions that contain both A and B.
- Formula: Support(A \rightarrow B) = P(B \cap A)
- Example: If 20 out of 100 transactions include both milk and bread, then the support is 0.20.
Confidence
- How often items in B appear in transactions that contain A.
- The proportion of transactions containing A that also contain B.
- Formula: Confidence(A \rightarrow B) = P(B | A) = \frac{Support(A \cap B)}{Support(A)}
- Measures the strength of the implication.
Lift
- A measure of how much more likely B is to occur when A has occurred, compared to B occurring independently.
- Formula: Lift(A \rightarrow B) = \frac{Confidence(A \rightarrow B)}{Support(B)}
- Lift = \frac{P(B|A)}{P(B)} = \frac{Confidence}{P(B)}
- Lift > 1: Positive correlation (A and B occur more together than expected).
- Lift = 1: A and B are independent.
- Lift < 1: Negative correlation.

Used to identify all frequent itemsets in a dataset.
A subset of a frequent itemset must also be frequent.
Support threshold: Used to eliminate infrequent itemsets early.
New itemsets are generated by joining frequent itemsets of size k to create sets of size k + 1.
Can be computationally expensive with large datasets.
Pruning: Avoiding checking larger combinations if any of their subsets were already found to be infrequent.
- Example: If milk and eggs are not frequent, then milk, bread, and eggs are also not frequent.
Uses a bottom-up approach, extending frequent itemsets one item at a time.
Requires a minimum support threshold.
Lowering the minimum support increases the number of rules generated.
A rule is strong only if it has both high support and high confidence.
The confidence of a rule can never be greater than the support.

Likelihood Ratio is equivalent to Lift.
- Compares P(E | H) to P(E | \neg H)
- Lift compares with overall P(E).
- If the likelihood ratio is greater than one, the evidence favors H.
- If the likelihood ratio is less than one, the evidence favors the opposite of H.
- If the likelihood ratio is equal to one, then evidence is neutral.
Bayesian rules = update beliefs with new evidence.
Association rules = find co-occurrences.
Posterior Odds = Prior Odds x Likelihood Ratio
Posterior Probability = Odds/ (1 + Odds)
Prior Odds = The ratio of the probability of a hypothesis being true to it being false, before seeing evidence
P(Evidence | Hypothesis) / P(Evidence | Not Hypothesis)

A machine learning technique that makes predictions more accurate by combining many simple models into one powerful model.
Starts with one weak model.
Observes the mistakes the model makes.
Trains the next model to fix those mistakes.
Repeats to create a final strong prediction.
Does not rely on random sampling; it uses a full dataset.