Importance of feedback during lectures for effective learning.
Previous topic covered: Probability concepts, including probability mass functions and density functions.
A PDF is used when the random variable is continuous or real valued.
Normal distribution is a common type of PDF but not the only one.
PDF maps real values to probabilities (0 to 1).
Density indicates a range of values around a specific point not a specific value.
High density regions: Numerous observations clustered within a small area.
Low density regions: Few observations scattered over an area.
Example: Comparing run rates in cricket; consistent high averages versus outliers.
Used for discrete random variables.
Maps unique outcomes to probabilities (0 to 1).
Example: India winning (0.68) vs not winning (0.32); this is Bernoulli distribution.
More than two outcomes can extend the PMF.
Example: Height categorization (tall, medium, short) mapped to probabilities.
Sum of probabilities for all outcomes must equal 1 (PMFs).
Probabilities must be non-negative for all outcomes.
For PDFs, the area under the curve (integral) over all x should equal 1.
The foundation of machine learning relies on understanding probabilities.
Decision-making under uncertainty modeled via distributions, crucial for machine learning.
Example: Predicting outcomes in cricket matches using historical data.
Probability of winning: Historical analysis leads to estimated probabilities, which guide further decisions.
Bernoulli trials used for binary outcomes such as win/nots.
Learning from data is necessary when the actual probability (theta) is unknown.
Modeled outcomes: Analyzing if historical outcomes truly reflect underlying probabilities.
MLE principle: Choose theta that maximizes the probability of observed data.
If the probability distribution is known, max likelihood of observing the data helps estimate underlying parameters.
Example: Probability distribution is fixed; observations are independent given the distribution.
Estimates come with uncertainty represented through confidence intervals.
Example: If theta = 0.7, confidence interval could suggest actual theta may range from 0.55 to 0.85.
Larger datasets generally reduce uncertainty.
IID (Independent and Identically Distributed) is crucial for probability modeling.
Each observation's probability remains constant in IID, relevant for MLE calculations.
Observations generated under a particular process (e.g., a cricket match).
Assumption: Same probability distribution governs successive observations.
Log transformation used to simplify derivative calculations for optimization.
Optimal theta derived from observing a ratio of wins to total attempts.
Result: Estimated probability of success (theta) equals number of wins divided by total observations.
Next discussions will include more complex models like logistic regression to consider additional influencing factors (e.g., player morale).
Understanding probability and estimation leads to more confident decision-making processes.