lession 2

Introduction

  • Importance of feedback during lectures for effective learning.

  • Previous topic covered: Probability concepts, including probability mass functions and density functions.

Probability Density Functions (PDFs)

  • A PDF is used when the random variable is continuous or real valued.

  • Normal distribution is a common type of PDF but not the only one.

  • PDF maps real values to probabilities (0 to 1).

  • Density indicates a range of values around a specific point not a specific value.

Visualizing Density

  • High density regions: Numerous observations clustered within a small area.

  • Low density regions: Few observations scattered over an area.

  • Example: Comparing run rates in cricket; consistent high averages versus outliers.

Probability Mass Functions (PMFs)

  • Used for discrete random variables.

  • Maps unique outcomes to probabilities (0 to 1).

  • Example: India winning (0.68) vs not winning (0.32); this is Bernoulli distribution.

Categorical Variables

  • More than two outcomes can extend the PMF.

  • Example: Height categorization (tall, medium, short) mapped to probabilities.

Key Constraints for PMFs and PDFs

  • Sum of probabilities for all outcomes must equal 1 (PMFs).

  • Probabilities must be non-negative for all outcomes.

  • For PDFs, the area under the curve (integral) over all x should equal 1.

Applications in Data Science

  • The foundation of machine learning relies on understanding probabilities.

  • Decision-making under uncertainty modeled via distributions, crucial for machine learning.

  • Example: Predicting outcomes in cricket matches using historical data.

    • Probability of winning: Historical analysis leads to estimated probabilities, which guide further decisions.

Learning and Estimating Parameters

  • Bernoulli trials used for binary outcomes such as win/nots.

  • Learning from data is necessary when the actual probability (theta) is unknown.

  • Modeled outcomes: Analyzing if historical outcomes truly reflect underlying probabilities.

Maximum Likelihood Estimation (MLE)

  • MLE principle: Choose theta that maximizes the probability of observed data.

  • If the probability distribution is known, max likelihood of observing the data helps estimate underlying parameters.

    • Example: Probability distribution is fixed; observations are independent given the distribution.

Uncertainties in Estimation

  • Estimates come with uncertainty represented through confidence intervals.

  • Example: If theta = 0.7, confidence interval could suggest actual theta may range from 0.55 to 0.85.

  • Larger datasets generally reduce uncertainty.

Understanding Independence

  • IID (Independent and Identically Distributed) is crucial for probability modeling.

  • Each observation's probability remains constant in IID, relevant for MLE calculations.

Data Assumptions and Generative Processes

  • Observations generated under a particular process (e.g., a cricket match).

  • Assumption: Same probability distribution governs successive observations.

Log Likelihood Function

  • Log transformation used to simplify derivative calculations for optimization.

  • Optimal theta derived from observing a ratio of wins to total attempts.

  • Result: Estimated probability of success (theta) equals number of wins divided by total observations.

Conclusion

  • Next discussions will include more complex models like logistic regression to consider additional influencing factors (e.g., player morale).

  • Understanding probability and estimation leads to more confident decision-making processes.

robot