eHealth Data Mining Notes

Focus: Data mining in eHealth
Upcoming content:
- Explanation and application of data mining.
- Introduction to algorithms used in healthcare data analysis.
- Discussion on interpretability and trustworthiness in data mining applications, especially in health contexts.

Data mining is defined by Witter and Frank as the process of discovering meaningful patterns in data.
Recent Definition: Extracting useful information from data. This could mean anything from predicting disease incidence to uncovering hidden trends in patient data.

Data mining process begins with collecting relevant medical data.
- Data often requires preprocessing:
- Handling missing values.
- Creating new attributes from existing data.
- Redundant or irrelevant data should be removed to refine the dataset, a process known as attribute selection.
Example: In a dataset for breast cancer, attributes might include tumor size, age at diagnosis, and treatment history.

Structured Data:
- Data formatted into a table with records (rows) and features (columns).
- Each row represents individual patient data.

Relevancy and Maintainability:
- Example: Instead of recording a patient's age (which changes), record their date of birth (constant). This reduces future data maintenance.

Clustering:
- Method for identifying groups within datasets.
- Used in applications like market analysis or patient treatment responses.
Regression:
- Establishes a model to predict numerical outcomes over time using historical data.
- Example: Predicting flu infection rates in winter seasons.
Classification:
- Predicts the class of given instances based on attribute values. Uses training and test datasets.
- Implements algorithms like:
  - Artificial Neural Networks:
  - Mimics neural connections in the brain; suitable for complex datasets but less interpretable.
  - Support Vector Machines:
  - Seeks to find hyperplanes in data separating two categories.
  - Decision Trees:
  - Offers a visual representation of decisions; easy to interpret and useful in medical decisions.

Evaluate the effectiveness of models using a separate test set where outcomes are hidden.
Comparison of predicted outcomes against known outcomes measures the model's accuracy.

Performance assessment of algorithms.
Case studies related to real-world applications of data mining in health contexts.
Focus on the interpretability of results – essential for trust in healthcare data mining tools.

A special class on Friday indicating support for the first assignment, ensuring students refine their essays with specific queries addressed by the class supervisor, Simon.