1/61
Flashcards generated from lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is data in the context of healthcare?
Raw facts or measurements without context. Examples: blood pressure reading 180/110, temperature 38.5°C.
What is information in the context of healthcare?
Data with meaning and context. Example: Blood pressure 180/110 is classified as "high blood pressure."
What is knowledge in the context of healthcare?
Understanding what information means and what actions to take. Example: High blood pressure increases heart attack risk and requires treatment.
Identify data, information, and knowledge: A patient's cholesterol level is 250 mg/dL. The doctor says this is high and recommends statins.
Data: Cholesterol level of 250 mg/dL. Information: This level is classified as "high." Knowledge: High cholesterol needs treatment with statins to reduce heart disease risk.
What is KDD (Knowledge Discovery in Databases)?
The process of discovering useful knowledge from data through 4 steps:
1) Collect data,
2) Clean and prepare it,
3) Find patterns,
4) Turn patterns into useful knowledge.
What are the main types of medical data?
Six main types:
1. Narrative (clinical notes)
2. Structured text (standardized forms)
3. Numerical measurements (lab values) 4. Signal data (ECG, EEG)
5. Images (X-rays, MRIs) 6. Genetic information
What are the four types of measurement scales used in healthcare data?
Nominal scales: Categories with no order (e.g., blood types)
Ordinal scales: Categories with a meaningful order (e.g., pain scale 1-10)
Interval scales: Equal intervals but no true zero (e.g., temperature in Celsius)
Ratio scales: Equal intervals with a meaningful zero (e.g., weight, height)
What is PPG (Photoplethysmography)?
Technology that measures blood volume changes in vessels using light to determine heart rate and other cardiovascular metrics.
Explain how PPG sensors measure heart rate.
1) LED light shines onto skin
2) Blood absorbs light proportionally to blood volume
3) Photodetector measures reflected light 4) Changes in light intensity (more absorption during heartbeats, less between beats) are processed to calculate heart rate
How does an Apple Watch use PPG to measure heart rate?
It uses green LED lights and photodetectors to detect blood flow variations in the wrist, applying signal processing algorithms to calculate heart rate from the pattern of light absorption.
What are the advantages and limitations of PPG-based devices?
Advantages: Inexpensive, low power consumption, portable, convenient to wear.
Limitations: Sensitive to motion artifacts, less precise than ECG for detailed analysis
What are the limitations of green light PPG?
Green light is absorbed by skin (weakening the signal), affected by skin tone (melanin absorbs green light), and cannot reach deeper tissue because hemoglobin strongly absorbs it
What are the differences between green light and infrared light PPG sensors?
Green light PPG sensors have better signal-to-noise ratio and resistance to motion artifacts but are affected by skin tone and can't penetrate deep tissue. Infrared light PPG sensors penetrate 10x deeper into tissues, are less affected by skin characteristics (melanin, tattoos), but require more advanced signal processing to filter motion noise.
What advantages do infrared light PPG sensors offer?
Infrared light can penetrate much deeper into tissue, is less affected by skin tone variations, and can measure additional biometrics beyond heart rate (such as oxygen saturation, hydration, and muscle oxygen).
Why is green light often used in wrist-based PPG sensors?
Green light provides good signal-to-noise ratio and resistance to motion artifacts, making it effective for wrist-based measurements despite being less penetrating than infrared.
What is Pulse Wave Velocity (PWV)?
The speed at which the pressure wave moves along an artery. It can be calculated with the formula PWV = Distance/Time delay between pulse waves.
What is Heart Rate Variability (HRV)?
The variation in time intervals between successive heartbeats, measured as differences between consecutive heart beats.
How does HRV relate to stress levels?
Lower variation between heartbeats indicates higher stress; higher variation indicates relaxation.
What are the key HRV metrics used for stress monitoring?
AVNN (average of inter-beat intervals), SDNN (standard deviation of intervals), RMSSD (root mean square of successive differences).
What is the difference between supervised and unsupervised learning?
Supervised learning uses labeled data with known answers, while unsupervised learning finds patterns in unlabeled data without predefined categories.
What is descriptive data mining?
Finding patterns in existing data without making predictions. Examples: clustering and association rule mining.
What is predictive data mining?
Using historical data patterns to make predictions about future outcomes. Examples: classification and regression.
What is Classification in data mining?
Assigning items to predefined categories based on patterns learned from labeled training data.
What is Regression in data mining?
Finding a model (function) that maps a given input (attributes values) to a numeric prediction.
What are some applications of regression in healthcare?
Predicting patient recovery time, forecasting blood glucose levels, estimating hospital stay length, and calculating optimal drug dosages.
What is attribute selection in data mining?
Choosing only the useful features from data and ignoring irrelevant ones to improve model performance and reduce complexity.
What is attribute construction in data mining?
Creating new, more useful features from existing ones to help find patterns that aren't visible in the original data.
What are common preprocessing steps for healthcare data?
Filling in missing values, standardizing values to comparable scales, converting categories to numbers, removing outliers, and balancing class distributions.
How does class imbalance affect classification in healthcare?
With rare diseases, a model could achieve high accuracy by always predicting "healthy," making accuracy a misleading metric for performance.
What is clustering in data mining?
Grouping similar data objects together while keeping dissimilar objects in different groups, without using predefined categories.
How does K-means clustering work?
A 4-step process: 1. Select k initial centroids 2. Assign each point to nearest centroid 3. Recalculate centroids as means of assigned points 4. Repeat steps 2-3 until convergence (groups stop changing)
What is the objective function that K-means tries to minimize?
The sum of squared distances between each point and its cluster center: J(V) = ∑∑||xi - μj||²
How do you calculate the distance between a point and a centroid in K-means?
Use Euclidean distance: 1. Find the difference for each feature 2. Square each difference 3. Add all squared differences 4. Take the square root of the total
What is the difference between K-means and Partitioning Around Medoids (PAM)?
K-means uses calculated means as centroids which may not be actual data points; PAM uses existing data points (medoids) as centers.
What are "black-box" models in data mining?
Models that provide predictions without explaining their reasoning process (e.g., neural networks, SVMs).
What are "white-box" models in data mining?
Models that provide interpretable decision processes (e.g., decision trees, rule-based systems).
Why is model interpretability particularly important in healthcare applications?
Medical professionals need to understand and validate the reasoning behind predictions for patient safety, trust, and regulatory compliance.
What is a Confusion Matrix?
A table showing prediction performance with: • True Positives (TP): Correctly predicted "Yes" • False Positives (FP): Wrongly predicted "Yes" • True Negatives (TN): Correctly predicted "No" • False Negatives (FN): Wrongly predicted "No"
How do you calculate Accuracy from a confusion matrix?
Accuracy = (TP + TN) / (TP + TN + FP + FN)
How do you calculate Sensitivity (True Positive Rate)?
Sensitivity = TP / (TP + FN)
How do you calculate Specificity (True Negative Rate)?
Specificity = TN / (TN + FP)
Why is accuracy alone insufficient for healthcare applications?
Different types of errors have different consequences. Missing a disease (false negative) can be fatal, while a false alarm (false positive) causes anxiety and unnecessary tests.
When should you prioritize sensitivity in medical tests?
When missing the disease is very dangerous and treatment is safe and effective. For example, screening for treatable cancers.
When should you prioritize specificity in medical tests?
When the treatment is risky, expensive, or has serious side effects.
If you increase a test's threshold, what happens to sensitivity and specificity?
Raising the threshold: • Decreases sensitivity (more false negatives) • Increases specificity (fewer false positives)
What is Cross-Validation?
A technique to evaluate model performance by partitioning data into multiple training and testing subsets.
How does k-fold cross-validation work?
Split data into k equal parts. Use k-1 parts for training and 1 part for testing. Repeat k times using a different part for testing each time. Average the results.
What is stratification in cross-validation?
Ensuring each fold has the same mix of classes as the full dataset, preventing bias from uneven distribution of classes.
In 10-fold cross-validation with 1000 examples, how many examples are used for training in each round?
900 examples (90%) for training, 100 examples (10%) for testing in each round.
In 5-fold cross-validation with 5000 examples, how many examples are used for testing in each round?
1000 examples (20%) are used for testing in each round.
What is Leave-One-Out Cross-Validation (LOOCV)?
A special case of k-fold cross-validation where k equals the number of examples, so each example is used once as a test case.
When is Leave-One-Out Cross-Validation most appropriate?
When working with small datasets where maximizing the amount of training data is crucial.
For a dataset with 9,000 MRI images, if using 10-fold cross-validation, how many images would be used for training in each fold?
8,100 images (90% of 9,000) would be used for training in each fold.
What is Bayes' Theorem for medical testing?
P(disease|positive test) = [sensitivity × P(disease)] / [sensitivity × P(disease) + (1-specificity) × P(no disease)]
What is prior probability in diagnostic testing?
The estimated probability of disease before testing (often the disease prevalence in the population).
What is posterior probability in diagnostic testing?
The updated probability of disease after incorporating test results.
Given a test with 95% sensitivity and 90% specificity for a disease with 1% prevalence, what is the probability of having the disease after a positive test?
P(D|T+) = (0.95×0.01)/[(0.95×0.01)+(0.1×0.99)] = 0.0095/0.1085 ≈ 8.76%
Why does disease prevalence significantly impact the predictive value of tests?
In low-prevalence populations, even highly specific tests will generate many false positives relative to true positives, reducing positive predictive value.
How is a workflow formally defined?
WF = (T, P, C, A, S₀) where: • T is the task set • P is the precedence matrix • C is the conflict matrix • A is the precondition set • S₀ is the initial state
What is a Precedence Matrix in workflow modeling?
A matrix P = (Pᵢⱼ)ₘₓₘ where Pᵢⱼ = 1 indicates task j must be completed before task i can start.
What is a Conflict Matrix in workflow modeling?
A matrix C = (cᵢⱼ)ₘₓₘ where cᵢⱼ = 1 indicates tasks i and j cannot be performed simultaneously.
What are the four possible states of a task in workflow modeling?
S(Tᵢ) = 0: Not executable, not executed S(Tᵢ) = 1: Executable, not executed S(Tᵢ) = 2: Not executable, executed S(Tᵢ) = 3: Executable, executed