M.Sc. Health Informatics: Research Methods

M.Sc. Research Methods: Statistics

This document encapsulates the key concepts and methodologies relevant to the research methods utilized in health informatics, focusing on qualitative and quantitative analyses, statistical paradigms, and specific statistical tests used to analyze data.

Research Methods Overview

Qualitative vs. Quantitative Methods
- Qualitative Methods: Explore depth of understanding through unstructured or semi-structured data such as interviews; emphasizes content analysis and grounded theory.
- Quantitative Methods: Use structured tools such as surveys or experiments for statistical analysis; emphasizes numerical data.

Grounded Theory

Grounded Theory is a qualitative research methodology involved in systematically gathering and analyzing data about a phenomenon to develop theories.

Key Features:
- No Predefined Theoretical Framework: Grounded Theory allows findings to emerge from the data rather than forcing data to fit a preexisting theoretical framework.
- Open Sampling: Researchers begin with random samples to capture rich qualitative data that represent various perspectives.
- Open Coding: The process starts with identifying and labeling concepts from the data.
- Stop and Memo: Researchers take notes throughout the process to track insights and progress.
- Constant Comparison: Involves comparing new data with existing codes/categories to refine theories through triangulation of interviews, observations, and literature.
- Axial and Selective Coding: A subsequent stage where specific focus is placed on identifying connections and relationships in the data.
- Theoretical Saturation: Data collection continues until no new information or categories are generated.
- Historical Aspect: Reference to Glaser and Strauss emphasizes evolution and debate within the Theory.

Empirical Design

Empirical Design involves strategies to structure research methodologies and includes:

Cross-Sectional Design: An analysis conducted at a single point in time.
Longitudinal Design: Research conducted over time to observe changes.
Case Study: In-depth exploration of a single case or a few cases within a certain context.
Comparative Design: Involves comparing different groups or conditions.
Experiments: Manipulation of variables to determine effects.

Positivist Experimental Paradigm

The positivist paradigm is a quantitative approach that focuses on hypothesis-driven study.

Hypothesis: A statement predicting the relationship between variables (e.g., A causes B).
Dependent Variable: The outcome measured in an experiment.
Independent Variable: The variable manipulated by the researcher to observe effects.
Control Mechanisms:
- Aim to eliminate all other variables that could influence the dependent variable, often referred to as ensuring that “nothing but A causes B.”
- Types of Control Mechanisms:
- Eliminate Factors: Remove confounding variables.
- Hold Constant: Keep certain variables fixed.
- Random Selection of Subjects: Ensures participants represent a larger population.
- Control Group: A baseline to compare changes against experimental groups.
- Blinding: Ensures that researchers and participants are unaware of the treatment assignments to prevent bias.

Importance of Statistics

Statistics play a crucial role in understanding variability and making inferences.

Variance: Indicates how much values in a dataset spread out from the mean.
Statistical Probability: Evaluates how likely it is for results to have occurred by chance. Higher improbability leads to greater confidence in results.
Levels of Significance:
- Typically denoted as p-values, e.g.,
- p=0.01: 1% chance results are random.
- p=0.05: 5% chance results are random.
- Acceptable significance levels depend on the context of the research.

Data Analysis Techniques

Summary Statistics

This method provides a quick overview of raw scores such as:

Mean scores of conditions for comparison.
- Example Scores:
- Condition 1 Score: [10, 6, 5, 6, 4, 4, 2, 5] → Total: 64, Mean: 6.4
- Condition 2 Score: [1, 3, 9, 8, 7, 5, 4] → Total: 40, Mean: 3.6

Inferential Statistics

Covers testing hypotheses drawn from the dataset; primarily covers those suitable for researchers to select appropriate statistical tests and understand outcomes.

Parametric Techniques: Used when data meets certain assumptions (interval data, normal distribution, homogeneity of variances).
Non-parametric Techniques: Applied when data does not meet those assumptions.

Selection Criteria for Statistical Tests

Two-tailed vs. One-tailed Tests: Determines whether to look for significant differences in both directions or just one.
Within-Subject vs. Between-Subject Designs:
- Within Subjects: Same participants across all conditions; may risk carry-over effects.
- Between Subjects: Different groups for each condition; can mitigate carry-over but may introduce variability.

Non-parametric Decision Trees

Test Selection for Non-parametric Analysis

Two conditions:
- Same Subjects → Wilcoxon test
- Different Subjects → Mann-Whitney U test
More than Two conditions:
- Same Subjects → Friedman test
- Different Subjects → Kruskal-Wallis test
Correlation: Spearman rank coefficient used for analyzing relationships.
Categorical Counts: Chi-square tests used for categorical data analysis.

Non-parametric General Principles

Preparing the Data: Must be appropriately structured.
Ranking Data: Address ties and calculate rankings within conditions.

Assisting with Rank Assignments

Detailed instructions on ranking overall scores within conditions across various tests, emphasizing the importance of accurate rank assignment in non-parametric tests, including calculations necessary to compute test statistics such as Wilcoxon and Mann-Whitney U tests with the given formulas for expected outcomes.

Example of Wilcoxon Test Application

Wilcoxon Test Mechanism:
- Compare ranks to find the smaller of the rank sums.
- Use statistical tables to interpret results based on smaller sums.

Key Statistical Tests and Their Applicabilities

Mann-Whitney U Test: Suitable for comparing two independent groups with no assumptions of normality.
Friedman Test: An extension of the Wilcoxon test for more than two related groups.
Kruskal-Wallis Test: Non-parametric version of ANOVA for comparing more than two independent groups.
Jonckheere Test: Used to determine trends across ordered groups, specifically observing if data reflects an increasing or decreasing trend.

Correlations and Their Measurement

Spearman Rank Correlation Coefficient: Measures strength and direction of association between two ranked variables.
Formula:
R = 1 - rac{6 imes ext{sum of } d^2}{N(N^2 - 1)}
Interpreting Correlation Coefficients: Ranges from -1 (perfect negative) to +1 (perfect positive).

Chi-Square Test Analysis

Purpose: Tests association between categorical variables by comparing observed frequencies to expected frequencies.
Appropriate Use: Requires sufficient sample size, generally >20 subjects for reliability.
Application: Chi-square formula applied as follows:
ext{Chi-square} = ext{sum} rac{(O - E)^2}{E}

Expected Frequencies Calculation

Derived by proportions within the sample context, ensuring a representative analysis.

Statistical Review and Key Takeaways

The lecture provides a foundational understanding of various statistical methods and models utilized within health informatics, useful in addressing real-world applications in research design. Importance is placed on selecting appropriate techniques, being able to interpret statistical data outputs, guiding students to leverage software like SPSS for practical application of techniques.