HS

Model evaluation K folds and Leave one out

  • Introduction to Machine Learning in Healthcare

    • Focus on practical application of machine learning in healthcare.
    • Discussion on training machine learning models specifically for healthcare problems.
  • Knowledge Discovery Process

    • Raw data needs to be filtered to become usable.
    • Classification is essential to identify relevant patterns.
    • Example of vast datasets (like smartwatch readings) often being mostly irrelevant (the paradox of dimensionality).
    • Need to scale down data to extract useful attributes and identify significant patterns.
  • Types of Machine Learning Tasks

    • Clustering: Identifying groups within data based on shared relevancy.
    • Example: clustering patients based on symptoms.
    • Predictive Tasks: Feeding data to a network to receive predictions (like health conditions).
    • Real-world example of smartphones identifying dog breeds through images.
    • Naive Bayes, decision trees, and artificial neural networks are mentioned.
    • Focus on decision trees for simplicity and explainability.
  • Classification Problems in Healthcare

    • Example dataset from students with attributes: activity, sleep time, blood pressure, heart rate, heart rate variability, stress levels (binary).

    • Predictor attributes are activity, sleep time, etc.; the class attribute is stress level (to be predicted).

    • Explanation of how to classify new data points based on learned patterns from the dataset.

  • Decision Tree Model

    • A visual representation helps to understand predictions and decision paths clearly.
    • Explains stepwise how to follow attributes (e.g., sleeping time, heart rate variability) to reach predictions.
    • Decision trees allow explainability, which is crucial in healthcare for trust and accountability.
    • Comparison to neural networks which are termed "black box" models due to opacity in decision-making.
  • Evaluating Machine Learning Models

    • Introduces cross-validation as a method for ensuring model reliability.
    • K-Fold Cross-Validation: Dividing data into k chunks, training on k-1 chunks and testing on the remaining one.
    • Ensures that the model is robust and prevents overfitting.
    • Importance of stratification in maintaining a representative mix of data in training and testing sets.
  • Challenges in Model Evaluation

    • Overfitting can occur in K-fold if training on seen data leads to wrong conclusions when applied to unseen data.
    • The necessity for models to generalize well to new data is crucial in healthcare contexts, where unseen patient data is the norm.
  • Leave-One-Out Cross-Validation

    • An alternative that tests each participant's data separately to evaluate the model's predictive ability on entirely new data.
    • More reflective of real-world applications, though potentially computationally expensive.
  • Final Thoughts

    • Emphasizes the importance of combining good modeling with thorough evaluation practices.
    • Challenges of machine learning in healthcare revolve around the requirement for interpretability and reliability due to high-stakes consequences.
    • Encourages looking beyond standard methodologies when specific application contexts (like healthcare) demand it.
  • Next Steps

    • Upcoming lectures will cover practical eHealth systems and their real-world implementations.
    • Questions welcomed.