HS

Module Recap and Exam Preparation Notes

  • Recap of Module Topics

    • Today's lecture will include a summary of all topics covered in the module.
    • Encourage students to complete homework before Thursday's discussion.
    • Students will be shown a past exam paper and will work on two specific questions until Thursday.
  • Session Structure

    • The upcoming session on Thursday will be student-led to ensure engagement and understanding.
    • Emphasizes the importance of student effort in preparation.
  • Overview of Medical Data

    • Medical data is crucial in healthcare delivery, including in primary care, hospitals, and referrals.
    • Functions of medical data:
    • Categorizing problems.
    • Understanding disease development and spread.
    • Supporting decisions on treatments.
    • Types of medical data:
    • Narrative text data (historically dominant).
    • Numerical measurements (e.g., blood pressure, glucose levels).
    • Signal recordings (e.g., EEG).
    • Images (e.g., MRI scans).
  • Difference Between Data, Information, and Knowledge

    • Data: Raw entities or values (e.g., temperature, medical history).
    • Information: Interpretation of data (e.g., determining if a temperature reading is high).
    • Knowledge: Insights gained through reasoning, studies, and comparisons of data (e.g., link between high sugar levels and diabetes risk).
  • Data Mining Process

    • Definition: Extracting meaningful information from data.
    • Knowledge Discovery in Databases (KDD) process:
    • Raw data collection, pre-processing, data mining (clustering/classification), and evaluation.
    • Emphasis on structured data: rows (records) and columns (attributes).
  • Data Mining Tasks

    • Differentiation between descriptive and predictive data mining tasks.
    • Classification: Learning patterns to predict outcomes.
    • Training set: Known outcomes used to build the model.
    • Test set: Unseen data where outcomes are predicted based on the model.
    • Evaluation metrics: confusion matrix (true positive, false positive, true negative, false negative) for model accuracy.
  • Model Evaluation

    • Accuracy calculation:
      Accuracy = \frac{TP + TN}{TP + TN + FP + FN}
    • Sensitivity (true positive rate) and specificity (true negative rate) also calculated to assess model performance, especially in imbalanced classes.
    • Trade-offs: Adjusting thresholds can impact sensitivity and specificity.
  • Importance of Interpretability in Classification Algorithms

    • White box models (e.g., decision trees) vs. black box models (e.g., neural networks).
    • Preference for white box models in healthcare for clinician understanding.
  • Clustering Techniques

    • Definition: Grouping similar data points based on characteristics, no target variable ('unsupervised learning').
    • Examples:
    • Hierarchical clustering (building dendrograms).
    • K-means and PAM (Partitioning Around Medoids) clustering (distance-based clustering).
    • Fuzzy C-means (allowing overlap between clusters).
  • Decision Making Under Uncertainty

    • Introduction to Bayes' Theorem for updating probabilities based on evidence.
    • Relation to prior probabilities, sensitivity, and specificity.
    • Application in a multi-test scenario to improve diagnostic accuracy.
  • Key Takeaways and Homework

    • Students encouraged to review materials, especially case studies or examples covered in class.
    • Homework:
    • Download and review the 2021 past exam paper from Moodle.
    • Attempt questions three and four for Thursday's session discussion.